Observability

Service Intelligence

Service Intelligence turns raw telemetry into a service-level view of your estate. You group the entities and log sources that make up a service, define its health as a set of LPQL-based KPIs, map how services depend on one another, and let LogPulse watch each KPI for anomalies. It runs on the same LPQL and ClickHouse engine as search and the SIEM, so a service's health signals are just queries you can click into and pivot from.

Overview

A service is a logical thing you operate, a checkout API, an authentication provider, a payments worker, assembled from the hosts, containers, and log sources that produce its telemetry. Service Intelligence gives each service a single health status rolled up from its KPIs, and collects them into one hub so you can see, at a glance, what is healthy and what needs attention.

It lives in the Observability hub, alongside Health Checks, Entities, and Maintenance windows. The hub lists every service with its current status and summary counts; opening a service drills into its KPIs, members, dependencies, anomalies, and settings.

Service-level health

One rolled-up status per service, derived from its KPIs, so you watch services rather than scattered metrics.

KPIs from LPQL

Any LPQL search becomes a health signal with thresholds, no separate metrics pipeline to feed.

Dependency-aware

Map upstream and downstream services to see how a failure propagates across the estate.

Anomaly-watched

Each KPI is baselined and watched for deviations, so drift is caught before it breaches a threshold.

Defining a Service

Create a service, give it a name and description, then tell LogPulse which telemetry belongs to it. Membership is resolved automatically and stays current as entities come and go, so you define the service once rather than maintaining a static list.

Membership & Scope

A service is scoped one of two ways:

Scope	How membership works
Entity labels	Match entities by their labels (for example team, app, or env). Membership rules select every entity whose labels match, and the set updates automatically as entities are discovered.
Log source	Scope by one or more log-source names. The service is defined by the sources it owns and has no entity members.

Note

Entity-labelled services follow your fleet: a new pod or host that carries the matching labels joins the service on its own, with no edit to the service definition.

KPIs

A KPI (Key Performance Indicator) turns an LPQL search into a health signal for the service. The query produces a value, a chosen field is read as the metric, and thresholds map that value to a severity. KPIs are the building blocks of a service's health.

Each KPI defines the search, the value field to read, and its thresholds. For example, a checkout error-rate KPI:

# KPI: checkout 5xx error rate (%)
web.access
| where service == "checkout"
| stats count() as total, count_if(status >= 500) as errors
| eval value = round(100.0 * errors / total, 2)

Read value as the metric, set the threshold direction to above, and a warning at 1% with a critical at 5%. The KPI is evaluated on a schedule; its latest value and severity are shown on the service, and you can chart it over 1h / 6h / 24h / 7d windows.

Thresholds & Severity

A threshold has a direction and up to two breach levels. The direction decides whether high or low values are bad; the levels decide how bad.

Field	Meaning
Direction	"above" flags values over the threshold (error rate, latency); "below" flags values under it (success rate, throughput).
Warning	The value at which the KPI turns Warning, an early signal, not yet an incident.
Critical	The value at which the KPI turns Critical. The service is breaching its objective.
Value field	Which field from the LPQL result is read as the metric.

Service Health

A service's overall status is rolled up from its KPIs: the most severe KPI wins, so a single Critical KPI makes the whole service Critical. The Observability hub summarizes the estate with counts of Critical, Warning, and Healthy services, and you can filter the list by status to triage the ones that matter.

Status	Meaning
Healthy	All KPIs are within their thresholds.
Warning	At least one KPI has crossed its warning level.
Critical	At least one KPI has crossed its critical level.
Unknown	No KPIs have reported yet (newly created, or awaiting first evaluation).

Dependencies

Services rarely fail in isolation. You can record upstream and downstream relationships between services and view them as a dependency graph, so when a service degrades you can see what it relies on and what relies on it. This turns a single red KPI into context: is checkout unhealthy because checkout broke, or because the payments service it depends on did?

KPI Anomaly Detection

Thresholds catch values you can name in advance; anomaly detection catches the ones you cannot. Each KPI is given a baseline, a per-KPI statistical profile of its normal range that accounts for daily and weekly seasonality, and LogPulse flags when the current value departs from it, even while it is still inside the static thresholds.

Anomalies surface on the service's Anomalies tab with the affected entity, the time the deviation started, and a severity. A per-KPI sensitivity setting controls how far from baseline a value must drift before it is flagged. Lower it to catch subtle drift, raise it to only flag dramatic swings.

Tip

Anomalies and thresholds are complementary: a threshold tells you a service has breached its objective; an anomaly warns you it is trending toward one before it does.

Entity 360

Every member of a service is an entity with its own unified view. Entity 360 combines an observability lens, the services, KPIs, and telemetry tracked for the entity, with a security lens of accumulated risk and reputation. The same view is reachable from the Observability hub and from the security side, so an SRE and an analyst arrive at the same full-context page from different directions. See the Security Monitoring (SIEM) page for the security lens.