Introducing Visual ETL Pipelines
Today we are launching Visual ETL Pipelines, a drag-and-drop pipeline editor that lets you build, test, and deploy log transformation workflows without writing code. If you have ever wrestled with Logstash configuration files, hand-rolled regex parsers, or maintained a brittle chain of shell scripts to clean up your logs before ingestion, this feature was built for you.
The Problem
Log data is messy. Application logs, infrastructure metrics, security events, and audit trails all arrive in different formats. Before you can search, alert, or visualize that data, you need to transform it: parse structured fields out of unstructured text, mask sensitive information, enrich events with metadata, and normalize schemas so that logs from different sources can be queried together.
Traditionally, this means writing Logstash pipelines, Fluentd configuration, or custom ETL scripts. These approaches work, but they share a common problem: the transformation logic is buried in configuration files that are difficult to reason about, hard to test, and painful to debug when something breaks at 3am.
A Visual Approach
LogPulse Visual ETL Pipelines replace configuration files with a visual canvas. The editor is built on @xyflow/react (ReactFlow), a production-grade library for node-based UIs. You build pipelines by dragging nodes onto the canvas and connecting them with edges. Each node performs a single, well-defined operation on the data flowing through it.
The pipeline engine itself lives in the packages/etl-engine package. When you deploy a pipeline, the engine traverses the node graph and processes each event sequentially through the chain. Every node receives the output of the previous node, transforms it, and passes the result downstream.
18 Node Types Across 6 Categories
We designed the node library to cover the vast majority of log transformation scenarios without requiring custom code. There are 18 node types organized into 6 categories:
Trigger nodes define how a pipeline starts. The start node is the entry point for every pipeline. Extract nodes pull data from external sources -- httpRequest fetches data from APIs, and splunkSearch queries existing Splunk deployments for migration scenarios.
Transform nodes are the core of any pipeline. The transform node applies arbitrary field mappings and transformations. The json and csv nodes parse structured data formats. The redactMask node detects and masks PII fields like email addresses, credit card numbers, and social security numbers -- essential for compliance with data privacy regulations. The fieldOperations node provides fine-grained control over individual fields, and mapCommonSchema normalizes log events into a standard schema.
Flow control nodes handle branching and iteration. The condition node evaluates expressions and routes events to different branches based on the result. The loop, loopStart, loopEnd, and loopBreak nodes let you iterate over arrays within a single event -- useful for processing batch payloads that contain multiple log entries.
Load nodes push processed data to destinations. The logpulseIngest node sends events directly to LogPulse for indexing. The lookup node enriches events by joining against external reference data -- for example, mapping IP addresses to geographic locations or user IDs to team names.
Utility nodes round out the library: log writes diagnostic output for debugging, and error handles failure cases gracefully.
Key Nodes in Depth
The redactMask node deserves special attention. When you process logs that may contain PII -- and in practice, almost all logs do -- you need to ensure sensitive data is masked before it reaches your search index. The redactMask node uses pattern matching to identify common PII formats and replaces them with masked values. You configure which patterns to match and how aggressively to mask, and the node handles the rest. This is not a nice-to-have; for teams operating under GDPR, HIPAA, or SOC 2, it is a requirement.
The condition node enables branching logic. You define an expression -- for example, "level equals error" or "status_code greater than 499" -- and the node routes events to different downstream paths based on the evaluation result. This lets you build pipelines that handle different log types differently: errors get enriched with stack trace parsing, warnings get sampled, and debug logs get dropped entirely.
The lookup node connects your pipeline to external data sources for enrichment. You provide a reference dataset (a CSV file, an API endpoint, or another LogPulse index), define the join key, and the node adds matching fields to each event as it passes through.
Trigger Modes
Pipelines can be triggered in three ways. Manual triggers let you run a pipeline on demand from the UI, which is ideal for testing and one-off migrations. Scheduled triggers run pipelines on a cron schedule -- for example, every 5 minutes or once a day. Webhook triggers expose an HTTP endpoint that external systems can POST to, enabling real-time ingestion from services that support webhook delivery.
A Real-World Example
Consider a common scenario: your application emits JSON-formatted access logs that contain user email addresses in the request body. You need to parse the JSON, mask the email addresses for compliance, enrich each event with the deployment region from a lookup table, and ingest the result into LogPulse.
In the visual editor, this is a five-node pipeline: start (trigger), json (parse the raw log), redactMask (mask email addresses), lookup (add deployment region), and logpulseIngest (write to LogPulse). You connect the nodes with edges, configure each one through its settings panel, and click deploy. The entire process takes under five minutes.
We believe log transformation should be a visual, testable, and collaborative process -- not a solo exercise in regex debugging. Visual ETL Pipelines are available today for all LogPulse users. Head to the Pipelines tab in your workspace to get started.