Data Retention & Storage
LogPulse uses a tiered storage architecture built on ClickHouse to balance query performance with storage cost. Logs move automatically through three storage tiers -- Hot, Warm, and Cold -- based on their age and the configured retention policies.
This architecture allows LogPulse to offer fast search over recent data while retaining historical data at a fraction of the cost of keeping everything on high-performance storage. All tiers are searchable from the same Log Explorer interface; the only difference is query latency.
Storage Tiers
Each storage tier is optimized for a different balance of performance and cost. Data moves between tiers automatically based on retention policies -- no manual migration is required.
| Tier | Storage Medium | Age Range | Search Latency | Description |
|---|---|---|---|---|
| Hot | NVMe SSD | 0-7 days | Sub-second | Primary storage for recent logs. All queries against hot data use indexed columnar scans for maximum performance. |
| Warm | HDD (RAID) | 7-30 days | 1-5 seconds | Secondary storage for moderately recent logs. Data is compressed and stored on high-capacity drives with sequential read optimization. |
| Cold | Object Storage (S3/GCS) | 30-90 days | 5-30 seconds | Archival storage for historical logs. Data is stored in columnar Parquet format on object storage with on-demand query execution. |
Hot Tier
The hot tier stores the most recent logs on NVMe SSDs with full ClickHouse columnar indexing. Queries against hot data benefit from primary key indexes, skip indexes, and in-memory caching. This tier handles the vast majority of interactive queries, live tail streams, and alert evaluations.
Hot tier capacity is determined by your plan. When hot storage is full, the oldest data is automatically migrated to the warm tier. The migration process runs continuously in the background without impacting query performance.
Warm Tier
The warm tier uses high-capacity HDDs with ZSTD compression. Data is reorganized into larger, sequential file blocks optimized for range scans. Warm tier queries are slower than hot tier but significantly faster than cold tier, making this tier suitable for incident investigations that span multiple days.
Cold Tier
The cold tier stores data in columnar Parquet format on object storage (Amazon S3, Google Cloud Storage, or Azure Blob Storage). Queries against cold data are executed on-demand using ephemeral compute instances. Cold queries have higher latency but are billed only for the data scanned, making this tier cost-effective for infrequent access patterns.
Cold data can be rehydrated to the warm tier on demand for faster access during extended investigations. Rehydration typically completes within 5-15 minutes depending on the volume of data.
Retention Policies
Retention policies define how long logs are kept before automatic deletion. Each plan includes a default retention period, and you can configure custom retention per index.
| Plan | Daily Ingestion | Hot Retention | Total Retention | Indexes |
|---|---|---|---|---|
| Free | 100 MB/day | 7 days | 7 days | 1 |
| Starter | 1 GB/day | 7 days | 14 days | 5 |
| Pro | 5 GB/day | 14 days | 30 days | 25 |
| Business | 25 GB/day | 30 days | 90 days | 100 |
| Enterprise (coming soon) | Custom | Custom | Up to 365 days | Unlimited |
Custom Retention Per Index
You can set a custom retention period on any index, overriding the plan default. Custom retention can be shorter or longer than the plan default, within the maximum allowed by your plan.
curl -X PUT https://api.logpulse.io/api/v1/indexes/web-access-logs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LOGPULSE_API_KEY" \
-d '{
"retention": {
"hot_days": 3,
"warm_days": 14,
"cold_days": 60,
"total_days": 77
}
}'Index Management
Creating Indexes
An index is a logical grouping of logs with shared retention policies and access controls. By default, all logs are stored in the "default" index. Create additional indexes to separate logs by environment, service, team, or compliance requirement.
curl -X POST https://api.logpulse.io/api/v1/indexes \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LOGPULSE_API_KEY" \
-d '{
"name": "security-audit",
"description": "Security and audit trail logs",
"retention": {
"total_days": 90
}
}'Index Patterns
Index patterns route incoming logs to the correct index based on source, level, or attribute values. Patterns are evaluated in priority order and the first match determines the target index.
# Route security logs to the security-audit index
Pattern: source=auth-service OR source=firewall OR level=security
Target: security-audit
Priority: 1
# Route debug logs to a short-retention index
Pattern: level=debug OR level=trace
Target: debug-logs
Priority: 2
# Everything else goes to the default index
Pattern: *
Target: default
Priority: 100Index-Level Retention
Each index can have its own retention policy that overrides the organization default. This allows you to keep security audit logs for 90 days while deleting debug logs after 3 days, all within the same LogPulse organization.
| Index | Hot | Warm | Cold | Total | Use Case |
|---|---|---|---|---|---|
| default | 7 days | 23 days | 0 days | 30 days | General application logs |
| security-audit | 14 days | 46 days | 30 days | 90 days | Compliance and audit trail |
| debug-logs | 3 days | 0 days | 0 days | 3 days | Verbose debug output |
| error-logs | 14 days | 46 days | 30 days | 90 days | Error investigation and trending |
| access-logs | 7 days | 23 days | 30 days | 60 days | Web access and request logs |
Data Lifecycle
Logs follow a predictable lifecycle from ingestion to deletion. Understanding this lifecycle helps you plan retention policies and estimate storage costs.
Ingestion
|
v
Hot Storage (NVMe SSD)
- Full columnar indexing
- Sub-second query performance
- Default: 7 days
|
v
Warm Storage (HDD)
- ZSTD compressed
- Sequential scan optimized
- Default: 7-30 days
|
v
Cold Storage (Object Storage)
- Parquet format
- On-demand query execution
- Default: 30-90 days
|
v
Deletion (or Archive)
- Automatic after retention expires
- Optional: export to your own storage before deletionTier transitions happen automatically in the background. The migration process is designed to be invisible: queries that span multiple tiers are seamlessly merged, and there is no gap in data availability during transitions.
Storage Costs
LogPulse storage costs vary by tier. The tiered architecture ensures that most of your data is stored at the lowest cost while keeping recent data on fast storage.
| Tier | LogPulse Cost (per GB/month) | Splunk Equivalent | Datadog Equivalent |
|---|---|---|---|
| Hot (SSD) | $0.80 | $2.50 | $1.70 |
| Warm (HDD) | $0.25 | $1.00 | $0.75 |
| Cold (Object) | $0.03 | $0.30 | $0.25 |
| Ingestion | $0.10/GB ingested | $0.50/GB | $0.10/GB |
Cost Estimation Example
Consider an application generating 50 GB/day of raw logs with a 90-day retention policy:
Raw ingestion: 50 GB/day
Compression ratio: 10:1
Stored per day: 5 GB/day
Hot storage (14 days):
5 GB/day x 14 days = 70 GB
70 GB x $0.80/GB = $56.00/month
Warm storage (days 15-30):
5 GB/day x 16 days = 80 GB
80 GB x $0.25/GB = $20.00/month
Cold storage (days 31-90):
5 GB/day x 60 days = 300 GB
300 GB x $0.03/GB = $9.00/month
Ingestion:
50 GB/day x 30 days x $0.10/GB = $150.00/month
Total estimated monthly cost: $235.00/monthArchival & Export
LogPulse can export log data to your own object storage for long-term archival beyond the retention window, compliance preservation, or integration with data lake pipelines.
Supported Destinations
| Destination | Configuration | Format Options |
|---|---|---|
| Amazon S3 | Bucket name, region, IAM role ARN or access key | JSON, CSV, Parquet |
| Google Cloud Storage | Bucket name, service account key JSON | JSON, CSV, Parquet |
| Azure Blob Storage | Container name, storage account, SAS token or managed identity | JSON, CSV, Parquet |
Scheduled Exports
Scheduled exports run automatically at a configured interval and export logs matching a LPQL query to the target destination. Exports can run hourly, daily, or weekly.
curl -X POST https://api.logpulse.io/api/v1/exports \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LOGPULSE_API_KEY" \
-d '{
"name": "daily-security-export",
"query": "index=security-audit",
"schedule": "daily",
"format": "parquet",
"destination": {
"type": "s3",
"bucket": "company-log-archive",
"prefix": "logpulse/security/",
"region": "us-east-1",
"role_arn": "arn:aws:iam::123456789012:role/logpulse-export"
},
"compression": "zstd",
"partition_by": ["date", "source"]
}'Exported files are partitioned by date and optionally by additional fields (source, level, etc.) for efficient querying with tools like Athena, BigQuery, or Spark.
Format Options
| Format | Best For | Compression | Typical Size |
|---|---|---|---|
| JSON | General purpose, human-readable, tool compatibility | gzip, zstd | ~2x compressed log size |
| CSV | Spreadsheet analysis, simple tooling | gzip | ~1.5x compressed log size |
| Parquet | Data lake queries (Athena, BigQuery, Spark), columnar analytics | snappy, zstd | ~0.5x compressed log size |
Compliance Retention
LogPulse provides features to help meet common compliance requirements for log retention and data management. Consult your compliance team for specific requirements applicable to your organization.
GDPR - Right to Deletion
To comply with GDPR right-to-erasure requests, LogPulse supports targeted deletion of logs containing specific personal data. Use the data deletion API to remove logs matching a query:
curl -X POST https://api.logpulse.io/api/v1/data/delete \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LOGPULSE_API_KEY" \
-d '{
"query": "attributes.user_id=\"usr_deleted_user\"",
"reason": "GDPR right-to-erasure request #12345",
"dry_run": false
}'SOC 2 - Audit Trail Retention
SOC 2 Type II audits typically require an audit trail retained for at least 1 year. LogPulse audit logs (user actions, API key usage, configuration changes) are automatically retained for 90 days on current plans (up to 365 days on Enterprise, coming soon). For longer retention, configure archival exports to your own storage. Application logs used for security monitoring should use the maximum retention available on your plan.
HIPAA Requirements
For organizations handling protected health information (PHI), LogPulse offers HIPAA-eligible configurations on the Enterprise plan. This includes encryption at rest (AES-256), encryption in transit (TLS 1.2+), audit logging of all data access, and a signed Business Associate Agreement (BAA).
HIPAA requires a minimum 6-year retention period for certain records. Configure indexes containing PHI-related logs with a 2190-day (6-year) retention policy or set up archival exports to your own HIPAA-compliant storage.
Estimating Storage
Use these guidelines to estimate your storage requirements and plan your LogPulse tier and budget.
Estimating Daily Log Volume
Log volume depends on your application architecture, log verbosity, and traffic patterns. Here are typical ranges by source type:
| Source Type | Typical Volume | Notes |
|---|---|---|
| Web server (access logs) | 1-5 GB/day per server | Depends on request volume; 1 KB per request average |
| Application service | 0.5-2 GB/day per service | Depends on log verbosity; production should use info level |
| Database | 0.1-1 GB/day per instance | Slow query logs and error logs; general logs are higher |
| Kubernetes cluster | 2-10 GB/day per cluster | Depends on pod count and container verbosity |
| Load balancer | 1-5 GB/day per LB | Access logs at ~0.5 KB per request |
| Security / Firewall | 0.5-5 GB/day per device | Depends on traffic volume and rule verbosity |
Compression Ratios
LogPulse applies ZSTD compression to all stored data. Typical compression ratios vary by log format:
| Log Type | Typical Compression Ratio | Effective Storage |
|---|---|---|
| Structured JSON logs | 10:1 to 15:1 | 1 GB raw = 70-100 MB stored |
| Unstructured text logs | 8:1 to 12:1 | 1 GB raw = 85-125 MB stored |
| Access logs (CLF/ELF) | 12:1 to 20:1 | 1 GB raw = 50-85 MB stored |
| Syslog | 10:1 to 14:1 | 1 GB raw = 70-100 MB stored |
For planning purposes, use a conservative 10:1 compression ratio. Actual ratios may be higher, especially for highly repetitive log formats like access logs.
Managing Storage
LogPulse provides tools to monitor and manage your storage usage proactively.
Storage Dashboard
The Storage Dashboard (Settings, then Storage) shows real-time and historical storage usage broken down by index and tier. Key metrics include total stored volume, daily ingestion rate, storage growth trend, and projected time until quota is reached.
Storage Quota Alerts
LogPulse automatically sends notifications when your storage usage approaches plan limits:
| Threshold | Notification | Action Required |
|---|---|---|
| 80% of daily ingestion quota | Email to organization Owners | Review ingestion volume; consider upgrading or reducing log verbosity |
| 90% of daily ingestion quota | Email and dashboard banner | Immediate action recommended; logs may be throttled at 100% |
| 100% of daily ingestion quota | Ingestion throttling begins | Excess logs are queued for up to 1 hour; upgrade plan or reduce volume |
| 80% of total storage quota | Email to organization Owners | Review retention policies; consider shorter retention or archival exports |
| 95% of total storage quota | Email, dashboard banner, and Slack (if configured) | Immediate action required; oldest data may be deleted early to free space |
Cleanup Recommendations
The Storage Dashboard includes automated cleanup recommendations based on your usage patterns:
Reduce debug log retention. If debug or trace level logs consume more than 30% of your storage, consider reducing their retention to 1-3 days or filtering them out during ingestion using ETL pipelines.
Archive before deletion. For indexes approaching retention expiry, configure archival exports to preserve data in your own storage at a lower cost.
Drop unused fields. Large attribute payloads increase storage consumption. Use ETL pipelines to drop fields that are never searched or filtered.
Consolidate duplicate logs. If multiple sources emit the same log event (for example, sidecar and application both logging the same request), deduplicate at the ingestion layer.