Alerting & Notifications

LogPulse alerting monitors your log data in real time and notifies you when conditions are met. LogPulse provides two complementary alerting systems: query-based alerts (threshold counting, pattern matching, and log absence detection) and AI-powered anomaly detection that automatically learns service baselines and detects deviations using statistical analysis.

Each alert rule defines a condition, an evaluation schedule, one or more notification channels, and an optional escalation policy. When a condition is met, LogPulse creates an alert event and dispatches notifications through the configured channels.

Alert Type	Description	Example Use Case
Threshold	Fires when a count or aggregation exceeds a numeric threshold within a time window.	More than 100 errors in 5 minutes
Anomaly Detection	AI-powered detection that learns service baselines and fires when metrics deviate beyond expected ranges using z-score analysis.	Unusual spike in error rate compared to the learned baseline for this time of day and day of week
Pattern Match	Fires when a log entry matches a specific regex pattern.	Stack trace containing OutOfMemoryError
Absence	Fires when no logs matching a query are received within a specified time window.	No heartbeat logs from payment-service for 10 minutes

Creating Alert Rules

To create an alert rule, navigate to Anomaly Detection in the left sidebar and click Create Rule. The rule builder walks you through the following steps:

1. Name and description: Give the rule a descriptive name and optional description. These appear in notifications, so make them actionable.

2. LPQL condition: Enter the query that defines what to monitor. The query runs on a schedule and its results are evaluated against the condition.

3. Condition type: Select from Threshold, Pattern Match, or Absence. Configure the specific parameters for the selected type. For AI-powered anomaly detection, use the Anomaly Detection section in the sidebar instead.

4. Evaluation schedule: Set how often the query runs (evaluation interval) and the time window for each evaluation. The interval should be shorter than or equal to the window.

5. Severity: Choose from Critical, Warning, or Info. This determines the urgency of notifications and the escalation behavior.

6. Notification channels: Select one or more channels to receive alerts. You can assign different channels per severity level.

7. Click Save Rule to activate the alert.

Example -- Threshold alert rule

Name: High Error Rate - API Gateway
LPQL: source=api-gateway level=error | stats count as error_count
Condition: error_count > 50
Window: 5 minutes
Interval: 1 minute
Severity: Warning
Channels: #ops-alerts (Slack), [email protected] (Email)

Alert Conditions

Threshold

Threshold alerts fire when a numeric aggregation result exceeds (or falls below) a specified value. The aggregation is computed from the LPQL query results over the configured time window.

Threshold condition examples

# Error count exceeds 100 in 5 minutes
level=error | stats count as cnt | where cnt > 100

# Average response time exceeds 2 seconds
source=api-gateway | stats avg(attributes.response_time_ms) as avg_rt
  | where avg_rt > 2000

# Distinct error sources exceed 5
level=error | stats dc(source) as source_count
  | where source_count > 5

Parameter	Description	Example
Operator	Comparison operator: >, >=, <, <=, ==, !=	> 100
Window	Time window for aggregation	5m, 15m, 1h
Interval	How often the condition is evaluated	1m, 5m
Consecutive	Number of consecutive breaches before firing	1 (default), 3

Anomaly Detection

Anomaly detection automatically learns baseline behavior for each monitored service using historical data segmented by day of week and hour of day. When a metric deviates beyond the expected range (calculated using z-score analysis), an anomaly is triggered. The sensitivity can be configured as low, medium, or high per service.

Monitored metrics include: log_count, error_count, error_rate, warn_count, warn_rate, info_count, and info_rate. The detection engine requires at least 5 days of baseline data before it can reliably detect anomalies. During the learning period, the service status shows as "learning" with progress information.

Anomaly detection configuration

Service: api-gateway
Metrics: error_count, error_rate
Sensitivity: medium
Detection interval: 2m
Detection lookback: 5m
Urgent detection: enabled (error rate > 50% with min batch size 20)

Pattern Match

Pattern match alerts fire when a log entry matches a regular expression pattern. Unlike threshold alerts that aggregate over a window, pattern match alerts evaluate each log entry individually and fire on the first match.

Pattern match examples

# Match Java OutOfMemoryError stack traces
Pattern: OutOfMemoryError|java\.lang\.OutOfMemoryError

# Match failed SSH login attempts
Pattern: Failed password for .+ from \d+\.\d+\.\d+\.\d+

# Match credit card number patterns (for PII detection)
Pattern: \b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b

Log Absence

Absence alerts fire when no logs matching a query are received within a specified time window. This is commonly used for heartbeat monitoring, health check verification, and detecting silent failures.

Absence alert examples

# No heartbeat from payment-service for 10 minutes
LPQL: source=payment-service message="heartbeat"
Absence window: 10 minutes

# No successful deployments in 24 hours (for CI/CD monitoring)
LPQL: source=deploy-service level=info message="deployment successful"
Absence window: 24 hours

Warning

Set absence windows longer than your service's expected log interval. A heartbeat that logs every 60 seconds should have an absence window of at least 3-5 minutes to avoid false positives from temporary network delays.

Notification Channels

LogPulse supports four notification channel types. Each channel can be used by multiple alert rules, and each rule can send to multiple channels.

Email

Email notifications are sent via Microsoft Graph. Configure the email service with your Azure credentials (tenant ID, client ID, client secret, and sender email address).

Email alerts include the rule name, severity, condition details, a summary of matching logs, and a direct link to the Log Explorer with the alert query pre-loaded. Custom email templates are supported using Mustache syntax or the v2 block-based template format.

Slack

Slack notifications are sent via incoming webhooks. Each channel configuration specifies a webhook URL, a target channel, and optional user or group mentions for escalation.

Slack messages include rich formatting with severity-colored sidebars, collapsible log excerpts, and action buttons for acknowledging or muting the alert directly from Slack.

PagerDuty

PagerDuty integration uses the Events API v2. Configure it with your integration key and a severity mapping that translates LogPulse severity levels to PagerDuty urgencies. LogPulse routes alerts to the current on-call engineer using PagerDuty's escalation policies, so alerts automatically reach the right person.

LogPulse Severity	PagerDuty Severity	PagerDuty Urgency
Critical	critical	High
Warning	warning	High
Info	info	Low

Microsoft Teams (Coming Soon)

Note

Microsoft Teams integration is planned for a future release. In the meantime, you can use the Generic Webhook channel to send notifications to Microsoft Teams via an incoming webhook connector.

Generic Webhook

The generic webhook channel sends a configurable HTTP POST request to any URL. You can customize the request headers and payload template using Handlebars-style placeholders for alert data.

Custom webhook payload template

{
  "alert_name": "{{rule.name}}",
  "severity": "{{alert.severity}}",
  "triggered_at": "{{alert.triggered_at}}",
  "condition": "{{rule.condition}}",
  "matching_count": {{alert.matching_count}},
  "dashboard_url": "{{alert.dashboard_url}}",
  "logs_sample": {{alert.logs_sample_json}}
}

Channel Configuration

The following table lists all configuration parameters for each notification channel.

Channel	Parameter	Required	Description
Email	emails	Yes	Array of recipient email addresses
Slack	webhookUrl	Yes	Slack incoming webhook URL
Slack	channel	No	Override the default webhook channel
PagerDuty	integrationKey	Yes	PagerDuty Events API v2 integration key
PagerDuty	serviceName	No	Service name for PagerDuty event grouping
Webhook	url	Yes	Target URL for the HTTP POST request
Webhook	headers	No	Custom HTTP headers (JSON object)

Escalation Policies

Escalation policies define a sequence of notification actions that execute if an alert is not acknowledged within a specified time. Each level in the escalation chain specifies a delay and one or more channels.

Example escalation policy

Policy: Critical Service Alert
  Level 1 (immediate):
    - Channel: #ops-alerts (Slack)
    - Channel: [email protected] (Email)

  Level 2 (after 15 minutes unacknowledged):
    - Channel: #ops-critical (Slack, mention @oncall-group)
    - Channel: PagerDuty (High urgency)

  Level 3 (after 30 minutes unacknowledged):
    - Channel: [email protected] (Email)
    - Channel: PagerDuty (Critical, escalation to management policy)

Escalation stops as soon as the alert is acknowledged at any level. If the alert is resolved automatically (the condition clears), all pending escalations are cancelled and a resolution notification is sent to all channels that received the original alert.

Note

Escalation policies can reference on-call rotation schedules from PagerDuty. Configure the PagerDuty integration in Settings to enable rotation-aware escalations.

Alert Lifecycle

Each alert event moves through a defined set of states. Understanding these states helps you manage alerts effectively and configure appropriate automation.

State	Description	Transitions To
Active	The alert condition is currently met. Notifications have been dispatched.	Acknowledged, Resolved, Muted
Acknowledged	A team member has acknowledged the alert. Escalation is paused.	Resolved, Active (if re-triggered)
Resolved	The alert condition is no longer met, or it was manually resolved.	Active (if condition recurs)
Muted	The alert is suppressed. No notifications are sent while muted.	Active (when mute expires)

Alerts can be acknowledged via the LogPulse dashboard, Slack action buttons, PagerDuty, or the API. The acknowledgment includes the user identity and timestamp for audit purposes.

Auto-resolve is enabled by default for threshold alerts. When the condition clears (the metric returns below the threshold), the alert is automatically resolved and a resolution notification is sent. Anomalies are auto-resolved when metrics return to normal ranges. Pattern match and absence alerts must be resolved manually or via the API. You can also provide feedback on anomalies (true positive or false positive) to help improve detection accuracy.

Muting & Maintenance Windows

Muting Alert Rules

You can mute individual alert rules or entire notification channels. Muted rules continue to evaluate their conditions but do not dispatch notifications. This is useful during known maintenance periods or when investigating a known issue.

To mute a rule, click the Mute button on the rule detail page and set the mute duration. Mutes can be set for 30 minutes, 1 hour, 4 hours, 24 hours, or a custom duration. The mute expires automatically after the specified time.

Maintenance Windows

Maintenance windows mute all alerts (or a filtered subset) during a scheduled period. They are useful for planned deployments, infrastructure upgrades, or recurring maintenance tasks.

Maintenance window configuration

Name: Weekly Database Maintenance
Schedule: Every Sunday 02:00 - 04:00 UTC
Scope: source=db-* (all database services)
Recurrence: Weekly
Mute behavior: Suppress notifications, continue evaluation
Post-window: Re-evaluate all rules, notify if conditions still met

Recurring maintenance windows can be configured on daily, weekly, or monthly schedules. One-time windows are also supported for ad-hoc maintenance events.

Alert History & Analytics

The Alert History page shows all past alert events with their state transitions, notification delivery status, and acknowledgment details. Use it to review incident timelines and measure response performance.

Metric	Description	Target
MTTA (Mean Time to Acknowledge)	Average time from alert firing to first acknowledgment.	Under 5 minutes for Critical, under 15 minutes for Warning
MTTR (Mean Time to Resolve)	Average time from alert firing to resolution.	Under 30 minutes for Critical, under 2 hours for Warning
False Positive Rate	Percentage of alerts that were resolved without action (no actual issue).	Under 10%
Alert Volume	Total number of alerts fired per day, week, or month.	Varies by environment; track trends rather than absolutes
Notification Delivery Rate	Percentage of notifications successfully delivered across all channels.	Above 99.5%

Alert analytics are available on the Anomaly Detection overview page. Use the time range picker to analyze alert trends over days, weeks, or months. The analytics dashboard includes charts for alert volume by severity, MTTA/MTTR trends, top firing rules, and channel delivery performance.

Best Practices

Follow these guidelines to build an effective and sustainable alerting strategy:

Do not alert on everything. Focus on conditions that require human intervention. If an alert does not require someone to take action, it should be a dashboard metric or a logged event, not an alert.

Use severity levels consistently. Reserve Critical for issues that impact customers or revenue. Use Warning for degraded performance or approaching thresholds. Use Info for awareness items that do not require immediate action.

Test your notification channels. After configuring a new channel, send a test notification to verify delivery. Test regularly to catch expired webhooks, rotated credentials, or changed channel configurations.

Review alerts regularly. Schedule a monthly review of all alert rules. Disable or tune rules with high false positive rates. Remove rules for decommissioned services. Adjust thresholds based on current traffic patterns.

Use escalation policies for critical alerts. Ensure that critical alerts always have a path to a human who can act, even outside business hours.

Tip

Start with a small set of high-value alerts and expand gradually. It is better to have five alerts that always require action than fifty alerts that are routinely ignored.

API Reference

Alert rules, channels, and mute windows can be managed programmatically via the LogPulse API. All endpoints require an API key with full access scope.

Monitored Services

Method	Endpoint	Description
GET	/api/v1/detect/services	List all monitored services
POST	/api/v1/detect/services	Add a new service to monitoring
PUT	/api/v1/detect/services/:id	Update service detection settings
POST	/api/v1/detect/services/:id/activate	Activate monitoring for a service
POST	/api/v1/detect/services/:id/dismiss	Dismiss a service from monitoring
GET	/api/v1/detect/services/:id/baseline-status	Get baseline learning status
GET	/api/v1/detect/services/:id/metrics-history	Get metrics history for a service
GET	/api/v1/detect/services/health	Get health overview of all services

Anomalies

Method	Endpoint	Description
GET	/api/v1/detect/anomalies	List anomalies with filtering
GET	/api/v1/detect/anomalies/:id	Get a specific anomaly
PUT	/api/v1/detect/anomalies/:id/acknowledge	Acknowledge an anomaly
PUT	/api/v1/detect/anomalies/:id/dismiss	Dismiss an anomaly
PUT	/api/v1/detect/anomalies/:id/feedback	Submit feedback (true/false positive)
POST	/api/v1/detect/anomalies/:id/investigate	Trigger AI investigation for an anomaly
GET	/api/v1/detect/anomalies/timeline	Get anomaly timeline data

Response Channels

Method	Endpoint	Description
GET	/api/v1/detect/response-channels	List all response channels
POST	/api/v1/detect/response-channels	Create a new response channel
PUT	/api/v1/detect/response-channels/:id	Update a response channel
DELETE	/api/v1/detect/response-channels/:id	Delete a response channel

Service Dependencies

Method	Endpoint	Description
GET	/api/v1/detect/dependencies	List service dependencies
POST	/api/v1/detect/dependencies	Create a service dependency
DELETE	/api/v1/detect/dependencies/:id	Delete a service dependency
POST	/api/v1/detect/dependencies/:id/confirm	Confirm a discovered dependency

Example -- Add a monitored service via API

curl -X POST https://api.logpulse.io/api/v1/detect/services \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "service": "api-gateway",
    "displayName": "API Gateway",
    "metrics": ["error_count", "error_rate"],
    "sensitivity": "medium",
    "detectionInterval": "2m",
    "detectionLookback": "5m",
    "urgentDetectionEnabled": true,
    "urgentErrorRateThreshold": 50,
    "urgentMinBatchSize": 20
  }'

Example -- Create a response channel via API

curl -X POST https://api.logpulse.io/api/v1/detect/response-channels \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "name": "Ops Slack Channel",
    "type": "slack",
    "config": {
      "webhookUrl": "https://hooks.slack.com/services/T.../B.../xxx"
    },
    "minSeverity": "high",
    "enabled": true
  }'

Alert Deduplication

LogPulse deduplicates alerts to prevent notification fatigue. When an alert rule is already in the Active state, subsequent condition matches do not create new alert events or trigger additional notifications (unless re-notification is configured).

Behavior	Description
Same-rule grouping	Multiple condition matches within the same evaluation window are grouped into a single alert event
State-based dedup	If a rule is already Active, new matches extend the alert duration but do not fire new notifications
Re-notification interval	Configurable per rule (e.g., re-notify every 30 minutes while still active). Default: no re-notification
Resolved + re-fire	If an alert auto-resolves and the condition triggers again, a new alert event is created

Tip

Configure re-notification intervals for critical alerts to ensure they are not forgotten. For low-priority alerts, rely on the dashboard and alert history instead of repeated notifications.

Alert Correlation (Roadmap)

Alert correlation is a planned feature that will group related alerts from multiple rules into a single incident. This reduces noise when a root cause triggers cascading failures across multiple services.

Capability	Status	Description
Multi-rule incidents	Planned	Correlate alerts from different rules that fire within the same time window into one incident
Service dependency suppression	Planned	If an upstream service is alerting, suppress downstream alerts automatically based on the service dependency graph
Topology-aware grouping	Planned	Group alerts by Kubernetes namespace, deployment, or service mesh topology

Note

Alert correlation is on the roadmap. The service dependency graph can already be configured via the API (see Service Dependencies above) and will be used by the correlation engine.

SLO Burn Rate Alerts (Planned)

SLO burn rate alerts are a planned feature that will let you define Service Level Objectives and alert when the error budget is being consumed faster than expected. This approach catches reliability degradation before the SLO is breached, giving your team time to respond.

The planned implementation uses a multi-window burn rate algorithm:

Window	Burn Rate Threshold	Use Case
5 minutes	14.4x	Page-level: critical issue consuming budget rapidly
30 minutes	6x	Ticket-level: significant degradation that needs attention within hours
6 hours	1x	Notification: slow burn that may breach the SLO within the budget period

Example: if your SLO is 99.9% success rate over a 30-day window, a 14.4x burn rate means you would exhaust the entire monthly error budget in about 2 hours. A 6x burn rate would exhaust it in about 5 hours.

Note

SLO burn rate alerts are on the roadmap. They will integrate with the existing notification channels and escalation policies.