Data Retention & Storage

LogPulse uses a tiered storage architecture built on ClickHouse to balance query performance with storage cost. Logs move automatically through three storage tiers -- Hot, Warm, and Cold -- based on their age and the configured retention policies.

This architecture allows LogPulse to offer fast search over recent data while retaining historical data at a fraction of the cost of keeping everything on high-performance storage. All tiers are searchable from the same Log Explorer interface; the only difference is query latency.

Storage Tiers

Each storage tier is optimized for a different balance of performance and cost. Data moves between tiers automatically based on retention policies -- no manual migration is required.

TierStorage MediumAge RangeSearch LatencyDescription
HotNVMe SSD0-7 daysSub-secondPrimary storage for recent logs. All queries against hot data use indexed columnar scans for maximum performance.
WarmHDD (RAID)7-30 days1-5 secondsSecondary storage for moderately recent logs. Data is compressed and stored on high-capacity drives with sequential read optimization.
ColdObject Storage (S3/GCS)30-90 days5-30 secondsArchival storage for historical logs. Data is stored in columnar Parquet format on object storage with on-demand query execution.
Note
The age ranges shown above are defaults. You can customize the tier boundaries per index or per retention policy to match your access patterns and budget.

Hot Tier

The hot tier stores the most recent logs on NVMe SSDs with full ClickHouse columnar indexing. Queries against hot data benefit from primary key indexes, skip indexes, and in-memory caching. This tier handles the vast majority of interactive queries, live tail streams, and alert evaluations.

Hot tier capacity is determined by your plan. When hot storage is full, the oldest data is automatically migrated to the warm tier. The migration process runs continuously in the background without impacting query performance.

Warm Tier

The warm tier uses high-capacity HDDs with ZSTD compression. Data is reorganized into larger, sequential file blocks optimized for range scans. Warm tier queries are slower than hot tier but significantly faster than cold tier, making this tier suitable for incident investigations that span multiple days.

Cold Tier

The cold tier stores data in columnar Parquet format on object storage (Amazon S3, Google Cloud Storage, or Azure Blob Storage). Queries against cold data are executed on-demand using ephemeral compute instances. Cold queries have higher latency but are billed only for the data scanned, making this tier cost-effective for infrequent access patterns.

Cold data can be rehydrated to the warm tier on demand for faster access during extended investigations. Rehydration typically completes within 5-15 minutes depending on the volume of data.

Retention Policies

Retention policies define how long logs are kept before automatic deletion. Each plan includes a default retention period, and you can configure custom retention per index.

PlanDaily IngestionHot RetentionTotal RetentionIndexes
Free100 MB/day7 days7 days1
Starter1 GB/day7 days14 days5
Pro5 GB/day14 days30 days25
Business25 GB/day30 days90 days100
Enterprise (coming soon)CustomCustomUp to 365 daysUnlimited

Custom Retention Per Index

You can set a custom retention period on any index, overriding the plan default. Custom retention can be shorter or longer than the plan default, within the maximum allowed by your plan.

Set custom retention via API
curl -X PUT https://api.logpulse.io/api/v1/indexes/web-access-logs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "retention": {
      "hot_days": 3,
      "warm_days": 14,
      "cold_days": 60,
      "total_days": 77
    }
  }'
Tip
Set shorter retention for high-volume, low-value logs (like health checks or debug-level logs) to reduce storage costs. Reserve longer retention for audit logs, security events, and error logs that may be needed for investigation.

Index Management

Creating Indexes

An index is a logical grouping of logs with shared retention policies and access controls. By default, all logs are stored in the "default" index. Create additional indexes to separate logs by environment, service, team, or compliance requirement.

Create an index via API
curl -X POST https://api.logpulse.io/api/v1/indexes \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "name": "security-audit",
    "description": "Security and audit trail logs",
    "retention": {
      "total_days": 90
    }
  }'

Index Patterns

Index patterns route incoming logs to the correct index based on source, level, or attribute values. Patterns are evaluated in priority order and the first match determines the target index.

Index pattern examples
# Route security logs to the security-audit index
Pattern: source=auth-service OR source=firewall OR level=security
Target: security-audit
Priority: 1

# Route debug logs to a short-retention index
Pattern: level=debug OR level=trace
Target: debug-logs
Priority: 2

# Everything else goes to the default index
Pattern: *
Target: default
Priority: 100

Index-Level Retention

Each index can have its own retention policy that overrides the organization default. This allows you to keep security audit logs for 90 days while deleting debug logs after 3 days, all within the same LogPulse organization.

IndexHotWarmColdTotalUse Case
default7 days23 days0 days30 daysGeneral application logs
security-audit14 days46 days30 days90 daysCompliance and audit trail
debug-logs3 days0 days0 days3 daysVerbose debug output
error-logs14 days46 days30 days90 daysError investigation and trending
access-logs7 days23 days30 days60 daysWeb access and request logs

Data Lifecycle

Logs follow a predictable lifecycle from ingestion to deletion. Understanding this lifecycle helps you plan retention policies and estimate storage costs.

Data lifecycle stages
Ingestion
  |
  v
Hot Storage (NVMe SSD)
  - Full columnar indexing
  - Sub-second query performance
  - Default: 7 days
  |
  v
Warm Storage (HDD)
  - ZSTD compressed
  - Sequential scan optimized
  - Default: 7-30 days
  |
  v
Cold Storage (Object Storage)
  - Parquet format
  - On-demand query execution
  - Default: 30-90 days
  |
  v
Deletion (or Archive)
  - Automatic after retention expires
  - Optional: export to your own storage before deletion

Tier transitions happen automatically in the background. The migration process is designed to be invisible: queries that span multiple tiers are seamlessly merged, and there is no gap in data availability during transitions.

Note
Data deletion is permanent and irreversible. If you need to retain data beyond your retention policy, configure an archival export to your own object storage before the retention window expires.

Storage Costs

LogPulse storage costs vary by tier. The tiered architecture ensures that most of your data is stored at the lowest cost while keeping recent data on fast storage.

TierLogPulse Cost (per GB/month)Splunk EquivalentDatadog Equivalent
Hot (SSD)$0.80$2.50$1.70
Warm (HDD)$0.25$1.00$0.75
Cold (Object)$0.03$0.30$0.25
Ingestion$0.10/GB ingested$0.50/GB$0.10/GB
Tip
LogPulse achieves a typical compression ratio of 10:1 on raw log data. A service generating 100 GB/day of raw logs will consume approximately 10 GB/day of stored data after compression.

Cost Estimation Example

Consider an application generating 50 GB/day of raw logs with a 90-day retention policy:

Storage cost calculation
Raw ingestion:        50 GB/day
Compression ratio:    10:1
Stored per day:       5 GB/day

Hot storage (14 days):
  5 GB/day x 14 days = 70 GB
  70 GB x $0.80/GB   = $56.00/month

Warm storage (days 15-30):
  5 GB/day x 16 days = 80 GB
  80 GB x $0.25/GB   = $20.00/month

Cold storage (days 31-90):
  5 GB/day x 60 days = 300 GB
  300 GB x $0.03/GB  = $9.00/month

Ingestion:
  50 GB/day x 30 days x $0.10/GB = $150.00/month

Total estimated monthly cost: $235.00/month

Archival & Export

LogPulse can export log data to your own object storage for long-term archival beyond the retention window, compliance preservation, or integration with data lake pipelines.

Supported Destinations

DestinationConfigurationFormat Options
Amazon S3Bucket name, region, IAM role ARN or access keyJSON, CSV, Parquet
Google Cloud StorageBucket name, service account key JSONJSON, CSV, Parquet
Azure Blob StorageContainer name, storage account, SAS token or managed identityJSON, CSV, Parquet

Scheduled Exports

Scheduled exports run automatically at a configured interval and export logs matching a LPQL query to the target destination. Exports can run hourly, daily, or weekly.

Scheduled export configuration
curl -X POST https://api.logpulse.io/api/v1/exports \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "name": "daily-security-export",
    "query": "index=security-audit",
    "schedule": "daily",
    "format": "parquet",
    "destination": {
      "type": "s3",
      "bucket": "company-log-archive",
      "prefix": "logpulse/security/",
      "region": "us-east-1",
      "role_arn": "arn:aws:iam::123456789012:role/logpulse-export"
    },
    "compression": "zstd",
    "partition_by": ["date", "source"]
  }'

Exported files are partitioned by date and optionally by additional fields (source, level, etc.) for efficient querying with tools like Athena, BigQuery, or Spark.

Format Options

FormatBest ForCompressionTypical Size
JSONGeneral purpose, human-readable, tool compatibilitygzip, zstd~2x compressed log size
CSVSpreadsheet analysis, simple toolinggzip~1.5x compressed log size
ParquetData lake queries (Athena, BigQuery, Spark), columnar analyticssnappy, zstd~0.5x compressed log size
Tip
Use Parquet format for archival exports. It provides the best compression, fastest query performance with analytics engines, and supports schema evolution for changing log structures over time.

Compliance Retention

LogPulse provides features to help meet common compliance requirements for log retention and data management. Consult your compliance team for specific requirements applicable to your organization.

GDPR - Right to Deletion

To comply with GDPR right-to-erasure requests, LogPulse supports targeted deletion of logs containing specific personal data. Use the data deletion API to remove logs matching a query:

GDPR deletion request
curl -X POST https://api.logpulse.io/api/v1/data/delete \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "query": "attributes.user_id=\"usr_deleted_user\"",
    "reason": "GDPR right-to-erasure request #12345",
    "dry_run": false
  }'
Warning
Deletion requests are irreversible. Use dry_run: true first to preview the number of affected records. Deletion may take up to 24 hours to propagate across all storage tiers.

SOC 2 - Audit Trail Retention

SOC 2 Type II audits typically require an audit trail retained for at least 1 year. LogPulse audit logs (user actions, API key usage, configuration changes) are automatically retained for 90 days on current plans (up to 365 days on Enterprise, coming soon). For longer retention, configure archival exports to your own storage. Application logs used for security monitoring should use the maximum retention available on your plan.

HIPAA Requirements

For organizations handling protected health information (PHI), LogPulse offers HIPAA-eligible configurations on the Enterprise plan. This includes encryption at rest (AES-256), encryption in transit (TLS 1.2+), audit logging of all data access, and a signed Business Associate Agreement (BAA).

HIPAA requires a minimum 6-year retention period for certain records. Configure indexes containing PHI-related logs with a 2190-day (6-year) retention policy or set up archival exports to your own HIPAA-compliant storage.

Estimating Storage

Use these guidelines to estimate your storage requirements and plan your LogPulse tier and budget.

Estimating Daily Log Volume

Log volume depends on your application architecture, log verbosity, and traffic patterns. Here are typical ranges by source type:

Source TypeTypical VolumeNotes
Web server (access logs)1-5 GB/day per serverDepends on request volume; 1 KB per request average
Application service0.5-2 GB/day per serviceDepends on log verbosity; production should use info level
Database0.1-1 GB/day per instanceSlow query logs and error logs; general logs are higher
Kubernetes cluster2-10 GB/day per clusterDepends on pod count and container verbosity
Load balancer1-5 GB/day per LBAccess logs at ~0.5 KB per request
Security / Firewall0.5-5 GB/day per deviceDepends on traffic volume and rule verbosity

Compression Ratios

LogPulse applies ZSTD compression to all stored data. Typical compression ratios vary by log format:

Log TypeTypical Compression RatioEffective Storage
Structured JSON logs10:1 to 15:11 GB raw = 70-100 MB stored
Unstructured text logs8:1 to 12:11 GB raw = 85-125 MB stored
Access logs (CLF/ELF)12:1 to 20:11 GB raw = 50-85 MB stored
Syslog10:1 to 14:11 GB raw = 70-100 MB stored

For planning purposes, use a conservative 10:1 compression ratio. Actual ratios may be higher, especially for highly repetitive log formats like access logs.

Managing Storage

LogPulse provides tools to monitor and manage your storage usage proactively.

Storage Dashboard

The Storage Dashboard (Settings, then Storage) shows real-time and historical storage usage broken down by index and tier. Key metrics include total stored volume, daily ingestion rate, storage growth trend, and projected time until quota is reached.

Storage Quota Alerts

LogPulse automatically sends notifications when your storage usage approaches plan limits:

ThresholdNotificationAction Required
80% of daily ingestion quotaEmail to organization OwnersReview ingestion volume; consider upgrading or reducing log verbosity
90% of daily ingestion quotaEmail and dashboard bannerImmediate action recommended; logs may be throttled at 100%
100% of daily ingestion quotaIngestion throttling beginsExcess logs are queued for up to 1 hour; upgrade plan or reduce volume
80% of total storage quotaEmail to organization OwnersReview retention policies; consider shorter retention or archival exports
95% of total storage quotaEmail, dashboard banner, and Slack (if configured)Immediate action required; oldest data may be deleted early to free space

Cleanup Recommendations

The Storage Dashboard includes automated cleanup recommendations based on your usage patterns:

Reduce debug log retention. If debug or trace level logs consume more than 30% of your storage, consider reducing their retention to 1-3 days or filtering them out during ingestion using ETL pipelines.

Archive before deletion. For indexes approaching retention expiry, configure archival exports to preserve data in your own storage at a lower cost.

Drop unused fields. Large attribute payloads increase storage consumption. Use ETL pipelines to drop fields that are never searched or filtered.

Consolidate duplicate logs. If multiple sources emit the same log event (for example, sidecar and application both logging the same request), deduplicate at the ingestion layer.

Tip
Run a quarterly storage review. Check which indexes consume the most space, which fields are never queried, and whether your retention policies still match your operational and compliance needs.