Alerting & Notifications

LogPulse alerting monitors your log data in real time and notifies you when conditions are met. LogPulse provides two complementary alerting systems: query-based alerts (threshold counting, pattern matching, and log absence detection) and AI-powered anomaly detection that automatically learns service baselines and detects deviations using statistical analysis.

Each alert rule defines a condition, an evaluation schedule, one or more notification channels, and an optional escalation policy. When a condition is met, LogPulse creates an alert event and dispatches notifications through the configured channels.

Alert TypeDescriptionExample Use Case
ThresholdFires when a count or aggregation exceeds a numeric threshold within a time window.More than 100 errors in 5 minutes
Anomaly DetectionAI-powered detection that learns service baselines and fires when metrics deviate beyond expected ranges using z-score analysis.Unusual spike in error rate compared to the learned baseline for this time of day and day of week
Pattern MatchFires when a log entry matches a specific regex pattern.Stack trace containing OutOfMemoryError
AbsenceFires when no logs matching a query are received within a specified time window.No heartbeat logs from payment-service for 10 minutes

Creating Alert Rules

To create an alert rule, navigate to Anomaly Detection in the left sidebar and click Create Rule. The rule builder walks you through the following steps:

1. Name and description: Give the rule a descriptive name and optional description. These appear in notifications, so make them actionable.

2. LPQL condition: Enter the query that defines what to monitor. The query runs on a schedule and its results are evaluated against the condition.

3. Condition type: Select from Threshold, Pattern Match, or Absence. Configure the specific parameters for the selected type. For AI-powered anomaly detection, use the Anomaly Detection section in the sidebar instead.

4. Evaluation schedule: Set how often the query runs (evaluation interval) and the time window for each evaluation. The interval should be shorter than or equal to the window.

5. Severity: Choose from Critical, Warning, or Info. This determines the urgency of notifications and the escalation behavior.

6. Notification channels: Select one or more channels to receive alerts. You can assign different channels per severity level.

7. Click Save Rule to activate the alert.

Example -- Threshold alert rule
Name: High Error Rate - API Gateway
LPQL: source=api-gateway level=error | stats count as error_count
Condition: error_count > 50
Window: 5 minutes
Interval: 1 minute
Severity: Warning
Channels: #ops-alerts (Slack), [email protected] (Email)

Alert Conditions

Threshold

Threshold alerts fire when a numeric aggregation result exceeds (or falls below) a specified value. The aggregation is computed from the LPQL query results over the configured time window.

Threshold condition examples
# Error count exceeds 100 in 5 minutes
level=error | stats count as cnt | where cnt > 100

# Average response time exceeds 2 seconds
source=api-gateway | stats avg(attributes.response_time_ms) as avg_rt
  | where avg_rt > 2000

# Distinct error sources exceed 5
level=error | stats dc(source) as source_count
  | where source_count > 5
ParameterDescriptionExample
OperatorComparison operator: >, >=, <, <=, ==, !=> 100
WindowTime window for aggregation5m, 15m, 1h
IntervalHow often the condition is evaluated1m, 5m
ConsecutiveNumber of consecutive breaches before firing1 (default), 3

Anomaly Detection

Anomaly detection automatically learns baseline behavior for each monitored service using historical data segmented by day of week and hour of day. When a metric deviates beyond the expected range (calculated using z-score analysis), an anomaly is triggered. The sensitivity can be configured as low, medium, or high per service.

Monitored metrics include: log_count, error_count, error_rate, warn_count, warn_rate, info_count, and info_rate. The detection engine requires at least 5 days of baseline data before it can reliably detect anomalies. During the learning period, the service status shows as "learning" with progress information.

Anomaly detection configuration
Service: api-gateway
Metrics: error_count, error_rate
Sensitivity: medium
Detection interval: 2m
Detection lookback: 5m
Urgent detection: enabled (error rate > 50% with min batch size 20)

Pattern Match

Pattern match alerts fire when a log entry matches a regular expression pattern. Unlike threshold alerts that aggregate over a window, pattern match alerts evaluate each log entry individually and fire on the first match.

Pattern match examples
# Match Java OutOfMemoryError stack traces
Pattern: OutOfMemoryError|java\.lang\.OutOfMemoryError

# Match failed SSH login attempts
Pattern: Failed password for .+ from \d+\.\d+\.\d+\.\d+

# Match credit card number patterns (for PII detection)
Pattern: \b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b

Log Absence

Absence alerts fire when no logs matching a query are received within a specified time window. This is commonly used for heartbeat monitoring, health check verification, and detecting silent failures.

Absence alert examples
# No heartbeat from payment-service for 10 minutes
LPQL: source=payment-service message="heartbeat"
Absence window: 10 minutes

# No successful deployments in 24 hours (for CI/CD monitoring)
LPQL: source=deploy-service level=info message="deployment successful"
Absence window: 24 hours
Warning
Set absence windows longer than your service's expected log interval. A heartbeat that logs every 60 seconds should have an absence window of at least 3-5 minutes to avoid false positives from temporary network delays.

Notification Channels

LogPulse supports four notification channel types. Each channel can be used by multiple alert rules, and each rule can send to multiple channels.

Email

Email notifications are sent via Microsoft Graph. Configure the email service with your Azure credentials (tenant ID, client ID, client secret, and sender email address).

Email alerts include the rule name, severity, condition details, a summary of matching logs, and a direct link to the Log Explorer with the alert query pre-loaded. Custom email templates are supported using Mustache syntax or the v2 block-based template format.

Slack

Slack notifications are sent via incoming webhooks. Each channel configuration specifies a webhook URL, a target channel, and optional user or group mentions for escalation.

Slack messages include rich formatting with severity-colored sidebars, collapsible log excerpts, and action buttons for acknowledging or muting the alert directly from Slack.

PagerDuty

PagerDuty integration uses the Events API v2. Configure it with your integration key and a severity mapping that translates LogPulse severity levels to PagerDuty urgencies.

LogPulse SeverityPagerDuty SeverityPagerDuty Urgency
CriticalcriticalHigh
WarningwarningHigh
InfoinfoLow

Microsoft Teams (Coming Soon)

Note
Microsoft Teams integration is planned for a future release. In the meantime, you can use the Generic Webhook channel to send notifications to Microsoft Teams via an incoming webhook connector.

Generic Webhook

The generic webhook channel sends a configurable HTTP POST request to any URL. You can customize the request headers and payload template using Handlebars-style placeholders for alert data.

Custom webhook payload template
{
  "alert_name": "{{rule.name}}",
  "severity": "{{alert.severity}}",
  "triggered_at": "{{alert.triggered_at}}",
  "condition": "{{rule.condition}}",
  "matching_count": {{alert.matching_count}},
  "dashboard_url": "{{alert.dashboard_url}}",
  "logs_sample": {{alert.logs_sample_json}}
}

Channel Configuration

The following table lists all configuration parameters for each notification channel.

ChannelParameterRequiredDescription
EmailemailsYesArray of recipient email addresses
SlackwebhookUrlYesSlack incoming webhook URL
SlackchannelNoOverride the default webhook channel
PagerDutyintegrationKeyYesPagerDuty Events API v2 integration key
PagerDutyserviceNameNoService name for PagerDuty event grouping
WebhookurlYesTarget URL for the HTTP POST request
WebhookheadersNoCustom HTTP headers (JSON object)

Escalation Policies

Escalation policies define a sequence of notification actions that execute if an alert is not acknowledged within a specified time. Each level in the escalation chain specifies a delay and one or more channels.

Example escalation policy
Policy: Critical Service Alert
  Level 1 (immediate):
    - Channel: #ops-alerts (Slack)
    - Channel: [email protected] (Email)

  Level 2 (after 15 minutes unacknowledged):
    - Channel: #ops-critical (Slack, mention @oncall-group)
    - Channel: PagerDuty (High urgency)

  Level 3 (after 30 minutes unacknowledged):
    - Channel: [email protected] (Email)
    - Channel: PagerDuty (Critical, escalation to management policy)

Escalation stops as soon as the alert is acknowledged at any level. If the alert is resolved automatically (the condition clears), all pending escalations are cancelled and a resolution notification is sent to all channels that received the original alert.

Note
Escalation policies can reference on-call rotation schedules from PagerDuty. Configure the PagerDuty integration in Settings to enable rotation-aware escalations.

Alert Lifecycle

Each alert event moves through a defined set of states. Understanding these states helps you manage alerts effectively and configure appropriate automation.

StateDescriptionTransitions To
ActiveThe alert condition is currently met. Notifications have been dispatched.Acknowledged, Resolved, Muted
AcknowledgedA team member has acknowledged the alert. Escalation is paused.Resolved, Active (if re-triggered)
ResolvedThe alert condition is no longer met, or it was manually resolved.Active (if condition recurs)
MutedThe alert is suppressed. No notifications are sent while muted.Active (when mute expires)

Alerts can be acknowledged via the LogPulse dashboard, Slack action buttons, PagerDuty, or the API. The acknowledgment includes the user identity and timestamp for audit purposes.

Auto-resolve is enabled by default for threshold alerts. When the condition clears (the metric returns below the threshold), the alert is automatically resolved and a resolution notification is sent. Anomalies are auto-resolved when metrics return to normal ranges. Pattern match and absence alerts must be resolved manually or via the API. You can also provide feedback on anomalies (true positive or false positive) to help improve detection accuracy.

Muting & Maintenance Windows

Muting Alert Rules

You can mute individual alert rules or entire notification channels. Muted rules continue to evaluate their conditions but do not dispatch notifications. This is useful during known maintenance periods or when investigating a known issue.

To mute a rule, click the Mute button on the rule detail page and set the mute duration. Mutes can be set for 30 minutes, 1 hour, 4 hours, 24 hours, or a custom duration. The mute expires automatically after the specified time.

Maintenance Windows

Maintenance windows mute all alerts (or a filtered subset) during a scheduled period. They are useful for planned deployments, infrastructure upgrades, or recurring maintenance tasks.

Maintenance window configuration
Name: Weekly Database Maintenance
Schedule: Every Sunday 02:00 - 04:00 UTC
Scope: source=db-* (all database services)
Recurrence: Weekly
Mute behavior: Suppress notifications, continue evaluation
Post-window: Re-evaluate all rules, notify if conditions still met

Recurring maintenance windows can be configured on daily, weekly, or monthly schedules. One-time windows are also supported for ad-hoc maintenance events.

Alert History & Analytics

The Alert History page shows all past alert events with their state transitions, notification delivery status, and acknowledgment details. Use it to review incident timelines and measure response performance.

MetricDescriptionTarget
MTTA (Mean Time to Acknowledge)Average time from alert firing to first acknowledgment.Under 5 minutes for Critical, under 15 minutes for Warning
MTTR (Mean Time to Resolve)Average time from alert firing to resolution.Under 30 minutes for Critical, under 2 hours for Warning
False Positive RatePercentage of alerts that were resolved without action (no actual issue).Under 10%
Alert VolumeTotal number of alerts fired per day, week, or month.Varies by environment; track trends rather than absolutes
Notification Delivery RatePercentage of notifications successfully delivered across all channels.Above 99.5%

Alert analytics are available on the Anomaly Detection overview page. Use the time range picker to analyze alert trends over days, weeks, or months. The analytics dashboard includes charts for alert volume by severity, MTTA/MTTR trends, top firing rules, and channel delivery performance.

Best Practices

Follow these guidelines to build an effective and sustainable alerting strategy:

Do not alert on everything. Focus on conditions that require human intervention. If an alert does not require someone to take action, it should be a dashboard metric or a logged event, not an alert.

Use severity levels consistently. Reserve Critical for issues that impact customers or revenue. Use Warning for degraded performance or approaching thresholds. Use Info for awareness items that do not require immediate action.

Test your notification channels. After configuring a new channel, send a test notification to verify delivery. Test regularly to catch expired webhooks, rotated credentials, or changed channel configurations.

Review alerts regularly. Schedule a monthly review of all alert rules. Disable or tune rules with high false positive rates. Remove rules for decommissioned services. Adjust thresholds based on current traffic patterns.

Use escalation policies for critical alerts. Ensure that critical alerts always have a path to a human who can act, even outside business hours.

Tip
Start with a small set of high-value alerts and expand gradually. It is better to have five alerts that always require action than fifty alerts that are routinely ignored.

API Reference

Alert rules, channels, and mute windows can be managed programmatically via the LogPulse API. All endpoints require an API key with full access scope.

Monitored Services

MethodEndpointDescription
GET/api/v1/detect/servicesList all monitored services
POST/api/v1/detect/servicesAdd a new service to monitoring
PUT/api/v1/detect/services/:idUpdate service detection settings
POST/api/v1/detect/services/:id/activateActivate monitoring for a service
POST/api/v1/detect/services/:id/dismissDismiss a service from monitoring
GET/api/v1/detect/services/:id/baseline-statusGet baseline learning status
GET/api/v1/detect/services/:id/metrics-historyGet metrics history for a service
GET/api/v1/detect/services/healthGet health overview of all services

Anomalies

MethodEndpointDescription
GET/api/v1/detect/anomaliesList anomalies with filtering
GET/api/v1/detect/anomalies/:idGet a specific anomaly
PUT/api/v1/detect/anomalies/:id/acknowledgeAcknowledge an anomaly
PUT/api/v1/detect/anomalies/:id/dismissDismiss an anomaly
PUT/api/v1/detect/anomalies/:id/feedbackSubmit feedback (true/false positive)
POST/api/v1/detect/anomalies/:id/investigateTrigger AI investigation for an anomaly
GET/api/v1/detect/anomalies/timelineGet anomaly timeline data

Response Channels

MethodEndpointDescription
GET/api/v1/detect/response-channelsList all response channels
POST/api/v1/detect/response-channelsCreate a new response channel
PUT/api/v1/detect/response-channels/:idUpdate a response channel
DELETE/api/v1/detect/response-channels/:idDelete a response channel

Service Dependencies

MethodEndpointDescription
GET/api/v1/detect/dependenciesList service dependencies
POST/api/v1/detect/dependenciesCreate a service dependency
DELETE/api/v1/detect/dependencies/:idDelete a service dependency
POST/api/v1/detect/dependencies/:id/confirmConfirm a discovered dependency
Example -- Add a monitored service via API
curl -X POST https://api.logpulse.io/api/v1/detect/services \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "service": "api-gateway",
    "displayName": "API Gateway",
    "metrics": ["error_count", "error_rate"],
    "sensitivity": "medium",
    "detectionInterval": "2m",
    "detectionLookback": "5m",
    "urgentDetectionEnabled": true,
    "urgentErrorRateThreshold": 50,
    "urgentMinBatchSize": 20
  }'
Example -- Create a response channel via API
curl -X POST https://api.logpulse.io/api/v1/detect/response-channels \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "name": "Ops Slack Channel",
    "type": "slack",
    "config": {
      "webhookUrl": "https://hooks.slack.com/services/T.../B.../xxx"
    },
    "minSeverity": "high",
    "enabled": true
  }'