Alerting & Notifications
LogPulse alerting monitors your log data in real time and notifies you when conditions are met. LogPulse provides two complementary alerting systems: query-based alerts (threshold counting, pattern matching, and log absence detection) and AI-powered anomaly detection that automatically learns service baselines and detects deviations using statistical analysis.
Each alert rule defines a condition, an evaluation schedule, one or more notification channels, and an optional escalation policy. When a condition is met, LogPulse creates an alert event and dispatches notifications through the configured channels.
| Alert Type | Description | Example Use Case |
|---|---|---|
| Threshold | Fires when a count or aggregation exceeds a numeric threshold within a time window. | More than 100 errors in 5 minutes |
| Anomaly Detection | AI-powered detection that learns service baselines and fires when metrics deviate beyond expected ranges using z-score analysis. | Unusual spike in error rate compared to the learned baseline for this time of day and day of week |
| Pattern Match | Fires when a log entry matches a specific regex pattern. | Stack trace containing OutOfMemoryError |
| Absence | Fires when no logs matching a query are received within a specified time window. | No heartbeat logs from payment-service for 10 minutes |
Creating Alert Rules
To create an alert rule, navigate to Anomaly Detection in the left sidebar and click Create Rule. The rule builder walks you through the following steps:
1. Name and description: Give the rule a descriptive name and optional description. These appear in notifications, so make them actionable.
2. LPQL condition: Enter the query that defines what to monitor. The query runs on a schedule and its results are evaluated against the condition.
3. Condition type: Select from Threshold, Pattern Match, or Absence. Configure the specific parameters for the selected type. For AI-powered anomaly detection, use the Anomaly Detection section in the sidebar instead.
4. Evaluation schedule: Set how often the query runs (evaluation interval) and the time window for each evaluation. The interval should be shorter than or equal to the window.
5. Severity: Choose from Critical, Warning, or Info. This determines the urgency of notifications and the escalation behavior.
6. Notification channels: Select one or more channels to receive alerts. You can assign different channels per severity level.
7. Click Save Rule to activate the alert.
Name: High Error Rate - API Gateway
LPQL: source=api-gateway level=error | stats count as error_count
Condition: error_count > 50
Window: 5 minutes
Interval: 1 minute
Severity: Warning
Channels: #ops-alerts (Slack), [email protected] (Email)Alert Conditions
Threshold
Threshold alerts fire when a numeric aggregation result exceeds (or falls below) a specified value. The aggregation is computed from the LPQL query results over the configured time window.
# Error count exceeds 100 in 5 minutes
level=error | stats count as cnt | where cnt > 100
# Average response time exceeds 2 seconds
source=api-gateway | stats avg(attributes.response_time_ms) as avg_rt
| where avg_rt > 2000
# Distinct error sources exceed 5
level=error | stats dc(source) as source_count
| where source_count > 5| Parameter | Description | Example |
|---|---|---|
| Operator | Comparison operator: >, >=, <, <=, ==, != | > 100 |
| Window | Time window for aggregation | 5m, 15m, 1h |
| Interval | How often the condition is evaluated | 1m, 5m |
| Consecutive | Number of consecutive breaches before firing | 1 (default), 3 |
Anomaly Detection
Anomaly detection automatically learns baseline behavior for each monitored service using historical data segmented by day of week and hour of day. When a metric deviates beyond the expected range (calculated using z-score analysis), an anomaly is triggered. The sensitivity can be configured as low, medium, or high per service.
Monitored metrics include: log_count, error_count, error_rate, warn_count, warn_rate, info_count, and info_rate. The detection engine requires at least 5 days of baseline data before it can reliably detect anomalies. During the learning period, the service status shows as "learning" with progress information.
Service: api-gateway
Metrics: error_count, error_rate
Sensitivity: medium
Detection interval: 2m
Detection lookback: 5m
Urgent detection: enabled (error rate > 50% with min batch size 20)Pattern Match
Pattern match alerts fire when a log entry matches a regular expression pattern. Unlike threshold alerts that aggregate over a window, pattern match alerts evaluate each log entry individually and fire on the first match.
# Match Java OutOfMemoryError stack traces
Pattern: OutOfMemoryError|java\.lang\.OutOfMemoryError
# Match failed SSH login attempts
Pattern: Failed password for .+ from \d+\.\d+\.\d+\.\d+
# Match credit card number patterns (for PII detection)
Pattern: \b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\bLog Absence
Absence alerts fire when no logs matching a query are received within a specified time window. This is commonly used for heartbeat monitoring, health check verification, and detecting silent failures.
# No heartbeat from payment-service for 10 minutes
LPQL: source=payment-service message="heartbeat"
Absence window: 10 minutes
# No successful deployments in 24 hours (for CI/CD monitoring)
LPQL: source=deploy-service level=info message="deployment successful"
Absence window: 24 hoursNotification Channels
LogPulse supports four notification channel types. Each channel can be used by multiple alert rules, and each rule can send to multiple channels.
Email notifications are sent via Microsoft Graph. Configure the email service with your Azure credentials (tenant ID, client ID, client secret, and sender email address).
Email alerts include the rule name, severity, condition details, a summary of matching logs, and a direct link to the Log Explorer with the alert query pre-loaded. Custom email templates are supported using Mustache syntax or the v2 block-based template format.
Slack
Slack notifications are sent via incoming webhooks. Each channel configuration specifies a webhook URL, a target channel, and optional user or group mentions for escalation.
Slack messages include rich formatting with severity-colored sidebars, collapsible log excerpts, and action buttons for acknowledging or muting the alert directly from Slack.
PagerDuty
PagerDuty integration uses the Events API v2. Configure it with your integration key and a severity mapping that translates LogPulse severity levels to PagerDuty urgencies.
| LogPulse Severity | PagerDuty Severity | PagerDuty Urgency |
|---|---|---|
| Critical | critical | High |
| Warning | warning | High |
| Info | info | Low |
Microsoft Teams (Coming Soon)
Generic Webhook
The generic webhook channel sends a configurable HTTP POST request to any URL. You can customize the request headers and payload template using Handlebars-style placeholders for alert data.
{
"alert_name": "{{rule.name}}",
"severity": "{{alert.severity}}",
"triggered_at": "{{alert.triggered_at}}",
"condition": "{{rule.condition}}",
"matching_count": {{alert.matching_count}},
"dashboard_url": "{{alert.dashboard_url}}",
"logs_sample": {{alert.logs_sample_json}}
}Channel Configuration
The following table lists all configuration parameters for each notification channel.
| Channel | Parameter | Required | Description |
|---|---|---|---|
| emails | Yes | Array of recipient email addresses | |
| Slack | webhookUrl | Yes | Slack incoming webhook URL |
| Slack | channel | No | Override the default webhook channel |
| PagerDuty | integrationKey | Yes | PagerDuty Events API v2 integration key |
| PagerDuty | serviceName | No | Service name for PagerDuty event grouping |
| Webhook | url | Yes | Target URL for the HTTP POST request |
| Webhook | headers | No | Custom HTTP headers (JSON object) |
Escalation Policies
Escalation policies define a sequence of notification actions that execute if an alert is not acknowledged within a specified time. Each level in the escalation chain specifies a delay and one or more channels.
Policy: Critical Service Alert
Level 1 (immediate):
- Channel: #ops-alerts (Slack)
- Channel: [email protected] (Email)
Level 2 (after 15 minutes unacknowledged):
- Channel: #ops-critical (Slack, mention @oncall-group)
- Channel: PagerDuty (High urgency)
Level 3 (after 30 minutes unacknowledged):
- Channel: [email protected] (Email)
- Channel: PagerDuty (Critical, escalation to management policy)Escalation stops as soon as the alert is acknowledged at any level. If the alert is resolved automatically (the condition clears), all pending escalations are cancelled and a resolution notification is sent to all channels that received the original alert.
Alert Lifecycle
Each alert event moves through a defined set of states. Understanding these states helps you manage alerts effectively and configure appropriate automation.
| State | Description | Transitions To |
|---|---|---|
| Active | The alert condition is currently met. Notifications have been dispatched. | Acknowledged, Resolved, Muted |
| Acknowledged | A team member has acknowledged the alert. Escalation is paused. | Resolved, Active (if re-triggered) |
| Resolved | The alert condition is no longer met, or it was manually resolved. | Active (if condition recurs) |
| Muted | The alert is suppressed. No notifications are sent while muted. | Active (when mute expires) |
Alerts can be acknowledged via the LogPulse dashboard, Slack action buttons, PagerDuty, or the API. The acknowledgment includes the user identity and timestamp for audit purposes.
Auto-resolve is enabled by default for threshold alerts. When the condition clears (the metric returns below the threshold), the alert is automatically resolved and a resolution notification is sent. Anomalies are auto-resolved when metrics return to normal ranges. Pattern match and absence alerts must be resolved manually or via the API. You can also provide feedback on anomalies (true positive or false positive) to help improve detection accuracy.
Muting & Maintenance Windows
Muting Alert Rules
You can mute individual alert rules or entire notification channels. Muted rules continue to evaluate their conditions but do not dispatch notifications. This is useful during known maintenance periods or when investigating a known issue.
To mute a rule, click the Mute button on the rule detail page and set the mute duration. Mutes can be set for 30 minutes, 1 hour, 4 hours, 24 hours, or a custom duration. The mute expires automatically after the specified time.
Maintenance Windows
Maintenance windows mute all alerts (or a filtered subset) during a scheduled period. They are useful for planned deployments, infrastructure upgrades, or recurring maintenance tasks.
Name: Weekly Database Maintenance
Schedule: Every Sunday 02:00 - 04:00 UTC
Scope: source=db-* (all database services)
Recurrence: Weekly
Mute behavior: Suppress notifications, continue evaluation
Post-window: Re-evaluate all rules, notify if conditions still metRecurring maintenance windows can be configured on daily, weekly, or monthly schedules. One-time windows are also supported for ad-hoc maintenance events.
Alert History & Analytics
The Alert History page shows all past alert events with their state transitions, notification delivery status, and acknowledgment details. Use it to review incident timelines and measure response performance.
| Metric | Description | Target |
|---|---|---|
| MTTA (Mean Time to Acknowledge) | Average time from alert firing to first acknowledgment. | Under 5 minutes for Critical, under 15 minutes for Warning |
| MTTR (Mean Time to Resolve) | Average time from alert firing to resolution. | Under 30 minutes for Critical, under 2 hours for Warning |
| False Positive Rate | Percentage of alerts that were resolved without action (no actual issue). | Under 10% |
| Alert Volume | Total number of alerts fired per day, week, or month. | Varies by environment; track trends rather than absolutes |
| Notification Delivery Rate | Percentage of notifications successfully delivered across all channels. | Above 99.5% |
Alert analytics are available on the Anomaly Detection overview page. Use the time range picker to analyze alert trends over days, weeks, or months. The analytics dashboard includes charts for alert volume by severity, MTTA/MTTR trends, top firing rules, and channel delivery performance.
Best Practices
Follow these guidelines to build an effective and sustainable alerting strategy:
Do not alert on everything. Focus on conditions that require human intervention. If an alert does not require someone to take action, it should be a dashboard metric or a logged event, not an alert.
Use severity levels consistently. Reserve Critical for issues that impact customers or revenue. Use Warning for degraded performance or approaching thresholds. Use Info for awareness items that do not require immediate action.
Test your notification channels. After configuring a new channel, send a test notification to verify delivery. Test regularly to catch expired webhooks, rotated credentials, or changed channel configurations.
Review alerts regularly. Schedule a monthly review of all alert rules. Disable or tune rules with high false positive rates. Remove rules for decommissioned services. Adjust thresholds based on current traffic patterns.
Use escalation policies for critical alerts. Ensure that critical alerts always have a path to a human who can act, even outside business hours.
API Reference
Alert rules, channels, and mute windows can be managed programmatically via the LogPulse API. All endpoints require an API key with full access scope.
Monitored Services
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/detect/services | List all monitored services |
| POST | /api/v1/detect/services | Add a new service to monitoring |
| PUT | /api/v1/detect/services/:id | Update service detection settings |
| POST | /api/v1/detect/services/:id/activate | Activate monitoring for a service |
| POST | /api/v1/detect/services/:id/dismiss | Dismiss a service from monitoring |
| GET | /api/v1/detect/services/:id/baseline-status | Get baseline learning status |
| GET | /api/v1/detect/services/:id/metrics-history | Get metrics history for a service |
| GET | /api/v1/detect/services/health | Get health overview of all services |
Anomalies
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/detect/anomalies | List anomalies with filtering |
| GET | /api/v1/detect/anomalies/:id | Get a specific anomaly |
| PUT | /api/v1/detect/anomalies/:id/acknowledge | Acknowledge an anomaly |
| PUT | /api/v1/detect/anomalies/:id/dismiss | Dismiss an anomaly |
| PUT | /api/v1/detect/anomalies/:id/feedback | Submit feedback (true/false positive) |
| POST | /api/v1/detect/anomalies/:id/investigate | Trigger AI investigation for an anomaly |
| GET | /api/v1/detect/anomalies/timeline | Get anomaly timeline data |
Response Channels
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/detect/response-channels | List all response channels |
| POST | /api/v1/detect/response-channels | Create a new response channel |
| PUT | /api/v1/detect/response-channels/:id | Update a response channel |
| DELETE | /api/v1/detect/response-channels/:id | Delete a response channel |
Service Dependencies
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/detect/dependencies | List service dependencies |
| POST | /api/v1/detect/dependencies | Create a service dependency |
| DELETE | /api/v1/detect/dependencies/:id | Delete a service dependency |
| POST | /api/v1/detect/dependencies/:id/confirm | Confirm a discovered dependency |
curl -X POST https://api.logpulse.io/api/v1/detect/services \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LOGPULSE_API_KEY" \
-d '{
"service": "api-gateway",
"displayName": "API Gateway",
"metrics": ["error_count", "error_rate"],
"sensitivity": "medium",
"detectionInterval": "2m",
"detectionLookback": "5m",
"urgentDetectionEnabled": true,
"urgentErrorRateThreshold": 50,
"urgentMinBatchSize": 20
}'curl -X POST https://api.logpulse.io/api/v1/detect/response-channels \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LOGPULSE_API_KEY" \
-d '{
"name": "Ops Slack Channel",
"type": "slack",
"config": {
"webhookUrl": "https://hooks.slack.com/services/T.../B.../xxx"
},
"minSeverity": "high",
"enabled": true
}'