Data Retention & Storage

LogPulse uses a tiered storage architecture built on ClickHouse to balance query performance with storage cost. Logs move automatically through three storage tiers -- Hot, Warm, and Cold -- based on their age and the configured retention policies.

This architecture allows LogPulse to offer fast search over recent data while retaining historical data at a fraction of the cost of keeping everything on high-performance storage. All tiers are searchable from the same Log Explorer interface; the only difference is query latency.

Storage Tiers

Each storage tier is optimized for a different balance of performance and cost. Data moves between tiers automatically based on retention policies -- no manual migration is required.

Tier	Storage Medium	Age Range	Search Latency	Description
Hot	NVMe SSD	0-7 days	Sub-second	Primary storage for recent logs. All queries against hot data use indexed columnar scans for maximum performance.
Warm	HDD (RAID)	7-30 days	1-5 seconds	Secondary storage for moderately recent logs. Data is compressed and stored on high-capacity drives with sequential read optimization.
Cold	Object Storage (S3/GCS)	30-90 days	5-30 seconds	Archival storage for historical logs. Data is stored in columnar Parquet format on object storage with on-demand query execution.

Note

The age ranges shown above are defaults. You can customize the tier boundaries per index or per retention policy to match your access patterns and budget.

Hot Tier

The hot tier stores the most recent logs on NVMe SSDs with full ClickHouse columnar indexing. Queries against hot data benefit from primary key indexes, skip indexes, and in-memory caching. This tier handles the vast majority of interactive queries, live tail streams, and alert evaluations.

Hot tier capacity is determined by your plan. When hot storage is full, the oldest data is automatically migrated to the warm tier. The migration process runs continuously in the background without impacting query performance.

Warm Tier

The warm tier uses high-capacity HDDs with ZSTD compression. Data is reorganized into larger, sequential file blocks optimized for range scans. Warm tier queries are slower than hot tier but significantly faster than cold tier, making this tier suitable for incident investigations that span multiple days.

Cold Tier

The cold tier stores data in columnar Parquet format on object storage (Amazon S3, Google Cloud Storage, or Azure Blob Storage). Queries against cold data are executed on-demand using ephemeral compute instances. Cold queries have higher latency but are billed only for the data scanned, making this tier cost-effective for infrequent access patterns.

Cold data can be rehydrated to the warm tier on demand for faster access during extended investigations. Rehydration typically completes within 5-15 minutes depending on the volume of data.

Retention Policies

Retention policies define how long logs are kept before automatic deletion. Each plan includes a default retention period, and you can configure custom retention per index.

Plan	Daily Ingestion	Total Retention
Free	1 GB/day	30 days
Starter	1 GB/day	14 days
Pro	10 GB/day	60 days
Business	50 GB/day	90 days
Enterprise (coming soon)	Custom	Up to 365 days

Custom Retention Per Index

You can set a custom retention period on any index, overriding the plan default. Custom retention can be shorter or longer than the plan default, within the maximum allowed by your plan.

Set custom retention via API

curl -X PUT https://api.logpulse.io/api/v1/indexes/web-access-logs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "retention": {
      "hot_days": 3,
      "warm_days": 14,
      "cold_days": 60,
      "total_days": 77
    }
  }'

Tip

Set shorter retention for high-volume, low-value logs (like health checks or debug-level logs) to reduce storage costs. Reserve longer retention for audit logs, security events, and error logs that may be needed for investigation.

Index Management

Creating Indexes

An index is a logical grouping of logs with shared retention policies and access controls. By default, all logs are stored in the "default" index. Create additional indexes to separate logs by environment, service, team, or compliance requirement.

Create an index via API

curl -X POST https://api.logpulse.io/api/v1/indexes \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "name": "security-audit",
    "description": "Security and audit trail logs",
    "retention": {
      "total_days": 90
    }
  }'

Index Patterns

Index patterns route incoming logs to the correct index based on source, level, or attribute values. Patterns are evaluated in priority order and the first match determines the target index.

Index pattern examples

# Route security logs to the security-audit index
Pattern: source=auth-service OR source=firewall OR level=security
Target: security-audit
Priority: 1

# Route debug logs to a short-retention index
Pattern: level=debug OR level=trace
Target: debug-logs
Priority: 2

# Everything else goes to the default index
Pattern: *
Target: default
Priority: 100

Index-Level Retention

Each index can have its own retention policy that overrides the organization default. This allows you to keep security audit logs for 90 days while deleting debug logs after 3 days, all within the same LogPulse organization.

Index	Hot	Warm	Cold	Total	Use Case
default	7 days	23 days	0 days	30 days	General application logs
security-audit	14 days	46 days	30 days	90 days	Compliance and audit trail
debug-logs	3 days	0 days	0 days	3 days	Verbose debug output
error-logs	14 days	46 days	30 days	90 days	Error investigation and trending
access-logs	7 days	23 days	30 days	60 days	Web access and request logs

Data Lifecycle

Logs follow a predictable lifecycle from ingestion to deletion. Understanding this lifecycle helps you plan retention policies and estimate storage costs.

Data lifecycle stages

Ingestion
  |
  v
Hot Storage (NVMe SSD)
  - Full columnar indexing
  - Sub-second query performance
  - Default: 7 days
  |
  v
Warm Storage (HDD)
  - ZSTD compressed
  - Sequential scan optimized
  - Default: 7-30 days
  |
  v
Cold Storage (Object Storage)
  - Parquet format
  - On-demand query execution
  - Default: 30-90 days
  |
  v
Deletion (or Archive)
  - Automatic after retention expires
  - Optional: export to your own storage before deletion

Tier transitions happen automatically in the background. The migration process is designed to be invisible: queries that span multiple tiers are seamlessly merged, and there is no gap in data availability during transitions.

Note

Data deletion is permanent and irreversible. If you need to retain data beyond your retention policy, configure an archival export to your own object storage before the retention window expires.

Storage Costs

LogPulse storage costs vary by tier. The tiered architecture ensures that most of your data is stored at the lowest cost while keeping recent data on fast storage.

Tier	LogPulse Cost (per GB/month)	Splunk Equivalent	Datadog Equivalent
Hot (SSD)	$0.80	$2.50	$1.70
Warm (HDD)	$0.25	$1.00	$0.75
Cold (Object)	$0.03	$0.30	$0.25
Ingestion	$0.10/GB ingested	$0.50/GB	$0.10/GB

Tip

LogPulse achieves a typical compression ratio of 10:1 on raw log data. A service generating 100 GB/day of raw logs will consume approximately 10 GB/day of stored data after compression.

Cost Estimation Example

Consider an application generating 50 GB/day of raw logs with a 90-day retention policy:

Storage cost calculation

Raw ingestion:        50 GB/day
Compression ratio:    10:1
Stored per day:       5 GB/day

Hot storage (14 days):
  5 GB/day x 14 days = 70 GB
  70 GB x $0.80/GB   = $56.00/month

Warm storage (days 15-30):
  5 GB/day x 16 days = 80 GB
  80 GB x $0.25/GB   = $20.00/month

Cold storage (days 31-90):
  5 GB/day x 60 days = 300 GB
  300 GB x $0.03/GB  = $9.00/month

Ingestion:
  50 GB/day x 30 days x $0.10/GB = $150.00/month

Total estimated monthly cost: $235.00/month

Archival & Export

LogPulse can export log data to your own object storage for long-term archival beyond the retention window, compliance preservation, or integration with data lake pipelines.

Supported Destinations

Destination	Configuration	Format Options
Amazon S3	Bucket name, region, IAM role ARN or access key	JSON, CSV, Parquet
Google Cloud Storage	Bucket name, service account key JSON	JSON, CSV, Parquet
Azure Blob Storage	Container name, storage account, SAS token or managed identity	JSON, CSV, Parquet

Scheduled Exports

Scheduled exports run automatically at a configured interval and export logs matching a LPQL query to the target destination. Exports can run hourly, daily, or weekly.

Scheduled export configuration

curl -X POST https://api.logpulse.io/api/v1/exports \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "name": "daily-security-export",
    "query": "index=security-audit",
    "schedule": "daily",
    "format": "parquet",
    "destination": {
      "type": "s3",
      "bucket": "company-log-archive",
      "prefix": "logpulse/security/",
      "region": "us-east-1",
      "role_arn": "arn:aws:iam::123456789012:role/logpulse-export"
    },
    "compression": "zstd",
    "partition_by": ["date", "source"]
  }'

Exported files are partitioned by date and optionally by additional fields (source, level, etc.) for efficient querying with tools like Athena, BigQuery, or Spark.

Format Options

Format	Best For	Compression	Typical Size
JSON	General purpose, human-readable, tool compatibility	gzip, zstd	~2x compressed log size
CSV	Spreadsheet analysis, simple tooling	gzip	~1.5x compressed log size
Parquet	Data lake queries (Athena, BigQuery, Spark), columnar analytics	snappy, zstd	~0.5x compressed log size

Tip

Use Parquet format for archival exports. It provides the best compression, fastest query performance with analytics engines, and supports schema evolution for changing log structures over time.

Compliance Retention

LogPulse provides features to help meet common compliance requirements for log retention and data management. Consult your compliance team for specific requirements applicable to your organization.

GDPR - Right to Deletion

To comply with GDPR right-to-erasure requests, LogPulse supports targeted deletion of logs containing specific personal data. Use the data deletion API to remove logs matching a query:

GDPR deletion request

curl -X POST https://api.logpulse.io/api/v1/data/delete \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "query": "attributes.user_id=\"usr_deleted_user\"",
    "reason": "GDPR right-to-erasure request #12345",
    "dry_run": false
  }'

Warning

Deletion requests are irreversible. Use dry_run: true first to preview the number of affected records. Deletion may take up to 24 hours to propagate across all storage tiers.

SOC 2 - Audit Trail Retention

SOC 2 Type II audits typically require an audit trail retained for at least 1 year. LogPulse audit logs (user actions, API key usage, configuration changes) are automatically retained for 180 days on current plans (up to 365 days on Enterprise, coming soon). For longer retention, configure archival exports to your own storage. Application logs used for security monitoring should use the maximum retention available on your plan.

HIPAA Requirements

For organizations handling protected health information (PHI), LogPulse offers HIPAA-eligible configurations on the Enterprise plan. This includes encryption at rest (AES-256), encryption in transit (TLS 1.2+), audit logging of all data access, and a signed Business Associate Agreement (BAA).

HIPAA requires a minimum 6-year retention period for certain records. Configure indexes containing PHI-related logs with a 2190-day (6-year) retention policy or set up archival exports to your own HIPAA-compliant storage.

Estimating Storage

Use these guidelines to estimate your storage requirements and plan your LogPulse tier and budget.

Estimating Daily Log Volume

Log volume depends on your application architecture, log verbosity, and traffic patterns. Here are typical ranges by source type:

Source Type	Typical Volume	Notes
Web server (access logs)	1-5 GB/day per server	Depends on request volume; 1 KB per request average
Application service	0.5-2 GB/day per service	Depends on log verbosity; production should use info level
Database	0.1-1 GB/day per instance	Slow query logs and error logs; general logs are higher
Kubernetes cluster	2-10 GB/day per cluster	Depends on pod count and container verbosity
Load balancer	1-5 GB/day per LB	Access logs at ~0.5 KB per request
Security / Firewall	0.5-5 GB/day per device	Depends on traffic volume and rule verbosity

Compression Ratios

LogPulse applies ZSTD compression to all stored data. Typical compression ratios vary by log format:

Log Type	Typical Compression Ratio	Effective Storage
Structured JSON logs	10:1 to 15:1	1 GB raw = 70-100 MB stored
Unstructured text logs	8:1 to 12:1	1 GB raw = 85-125 MB stored
Access logs (CLF/ELF)	12:1 to 20:1	1 GB raw = 50-85 MB stored
Syslog	10:1 to 14:1	1 GB raw = 70-100 MB stored

For planning purposes, use a conservative 10:1 compression ratio. Actual ratios may be higher, especially for highly repetitive log formats like access logs.

Managing Storage

LogPulse provides tools to monitor and manage your storage usage proactively.

Storage Dashboard

The Storage Dashboard (Settings, then Storage) shows real-time and historical storage usage broken down by index and tier. Key metrics include total stored volume, daily ingestion rate, storage growth trend, and projected time until quota is reached.

Storage Quota Alerts

LogPulse automatically sends notifications when your storage usage approaches plan limits:

Threshold	Notification	Action Required
80% of daily ingestion quota	Email to organization Owners	Review ingestion volume; consider upgrading or reducing log verbosity
90% of daily ingestion quota	Email and dashboard banner	Immediate action recommended; logs may be throttled at 100%
100% of daily ingestion quota	Ingestion throttling begins	Excess logs are queued for up to 1 hour; upgrade plan or reduce volume
80% of total storage quota	Email to organization Owners	Review retention policies; consider shorter retention or archival exports
95% of total storage quota	Email, dashboard banner, and Slack (if configured)	Immediate action required; oldest data may be deleted early to free space

Cleanup Recommendations

The Storage Dashboard includes automated cleanup recommendations based on your usage patterns:

Reduce debug log retention. If debug or trace level logs consume more than 30% of your storage, consider reducing their retention to 1-3 days or filtering them out during ingestion using pipelines.

Archive before deletion. For indexes approaching retention expiry, configure archival exports to preserve data in your own storage at a lower cost.

Drop unused fields. Large attribute payloads increase storage consumption. Use pipelines to drop fields that are never searched or filtered.

Consolidate duplicate logs. If multiple sources emit the same log event (for example, sidecar and application both logging the same request), deduplicate at the ingestion layer.

Tip

Run a quarterly storage review. Check which indexes consume the most space, which fields are never queried, and whether your retention policies still match your operational and compliance needs.

Legal Hold

Legal hold prevents automatic deletion of logs that may be required for litigation, regulatory investigation, or compliance preservation. When a legal hold is active, data matching the hold criteria is exempt from retention policy deletion until the hold is explicitly released.

Configuration	Description
Hold by index	All logs in a specific index are preserved regardless of retention policy
Hold by query	Logs matching a LPQL query are preserved (e.g., source=auth-service AND level=error)
Hold by time range	All logs within a specific date range are preserved across all indexes

Legal holds are available on Business and Enterprise plans. To configure a legal hold, contact support or use the API:

Create a legal hold via API

curl -X POST https://api.logpulse.io/api/v1/legal-holds \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "name": "Investigation 2026-Q1",
    "description": "Preserve auth logs for security investigation",
    "criteria": {
      "index": "security-audit",
      "query": "source=auth-service",
      "from": "2026-01-01T00:00:00Z",
      "to": "2026-03-31T23:59:59Z"
    }
  }'

Warning

Data under legal hold continues to consume storage and is billed at the applicable tier rate. Legal holds override all retention policies, including GDPR deletion requests for the held data. Consult your legal team before configuring holds.

Replay from Archive

Cold-tier and archived data can be rehydrated back to the warm tier for faster access during extended investigations. Replay makes historical data available with warm-tier query latency (1-5 seconds) instead of cold-tier latency (5-30 seconds).

Parameter	Value
Rehydration time	5-15 minutes per 100 GB
Rehydration target	Warm tier (HDD)
Billing	Rehydrated data billed at warm tier rate for the replay duration
Default replay TTL	7 days (configurable up to 30 days)
Trigger methods	Dashboard button or API call

Trigger archive replay via API

curl -X POST https://api.logpulse.io/api/v1/replay \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LOGPULSE_API_KEY" \
  -d '{
    "index": "security-audit",
    "from": "2025-12-01T00:00:00Z",
    "to": "2025-12-31T23:59:59Z",
    "ttl_days": 14
  }'

Once the replay completes, the rehydrated data appears in the Log Explorer alongside live data. Queries automatically span both live and replayed data. After the replay TTL expires, the rehydrated copy is deleted and the data returns to cold storage.

Tip

Use replay sparingly for targeted investigations. For recurring access to historical data, consider extending the warm tier retention for the relevant index instead.