Implementing Data Filters

This guide provides a comprehensive reference for developers creating filters to extract, enrich, and transform raw log data in UTMStack v11. Filters are YAML files used by the parsing plugin to convert raw events into a standardized format.

Developer Reference: This page is designed as a practical guide for implementing data transformation pipelines through filters.

What are Filters?

Filters define how to extract and transform data from raw events into a standardized format that can be:

Analyzed by correlation rules
Searched in Log Explorer
Visualized in dashboards
Stored efficiently

Purpose

Parse raw log formats (JSON, CSV, key-value, free text)
Extract relevant fields from unstructured data
Normalize field names across data sources
Enrich data with additional context
Transform data types for proper analysis

Filter Structure

pipeline:
  - dataTypes:              # Event types this filter applies to
      - apache
    steps:                  # Processing steps
      - json:               # Step 1: Parse JSON
          source: raw
      - rename:             # Step 2: Rename fields
          from:
            - log.host.ip
          to: origin.ip
      # Additional steps...

See complete documentation for all available filter steps and detailed examples. View Full Filter Implementation Guide →

Filter Steps Reference

Parsing Steps

Step	Purpose	Use Case
`json`	Parse JSON data	Structured logs from applications
`grok`	Pattern-based parsing	Unstructured text logs (Apache, Syslog)
`kv`	Key-value pair parsing	Simple formatted logs
`csv`	CSV data parsing	Comma-separated log formats

Transformation Steps

Step	Purpose	Use Case
`rename`	Rename fields	Standardize field names
`cast`	Convert data types	Ensure proper types for analysis
`reformat`	Reformat values	Timestamp conversion, string formatting
`trim`	Remove characters	Clean up parsed data

Enrichment Steps

Step	Purpose	Use Case
`add`	Add new fields	Add metadata, computed values
`dynamic`	Call external plugins	Geolocation, threat intelligence
`expand`	Expand nested data	Flatten complex structures

Cleanup Steps

Step	Purpose	Use Case
`delete`	Remove fields	Remove unnecessary data

Quick Start Example

Here’s a complete filter for Apache access logs:

pipeline:
  - dataTypes:
      - apache
    steps:
      # 1. Parse JSON container
      - json:
          source: raw

      # 2. Extract IP using grok
      - grok:
          patterns:
            - fieldName: origin.ip
              pattern: '{{.ipv4}}|{{.ipv6}}'
            - fieldName: deviceTime
              pattern: '\[{{.data}}\]'
            - fieldName: log.statusCode
              pattern: '{{.integer}}'
          source: log.message

      # 3. Convert to proper types
      - cast:
          fields:
            - log.statusCode
          to: int

      # 4. Add geolocation
      - dynamic:
          plugin: com.utmstack.geolocation
          params:
            source: origin.ip
            destination: origin.geolocation
          where: exists(origin.ip)

      # 5. Normalize action
      - add:
          function: 'string'
          params:
            key: action
            value: 'get'
          where: safe(log.method, "") == "GET"

      # 6. Clean up
      - delete:
          fields:
            - raw
            - log.message
          where: exists(action)

Development Workflow

Identify Data Source

Determine what log source you need to process

Analyze Raw Format

Examine sample raw events to understand structure

Create Filter File

Start with basic parsing steps

Add Transformation

Normalize fields and data types

Enrich Data

Add geolocation, classifications

Test Filter

Deploy and test with sample data

Optimize

Remove unnecessary fields, improve performance

Best Practices

Standardize Field Names

Use consistent naming across all filters
Follow UTMStack field mapping conventions
Common fields: origin.ip, target.ip, deviceTime, action, actionResult

Remove Unnecessary Data

Delete fields not needed for analysis
Reduces storage requirements
Improves query performance

Handle Missing Data

Use conditional steps with where clauses
Test with incomplete/malformed data
Provide sensible defaults

Optimize Performance

Apply heavy operations conditionally
Use efficient parsing methods
Delete unnecessary fields early in pipeline

Document Filters

Comment complex patterns
Explain transformation logic
Note data source requirements

Common Patterns

Pattern 1: Web Server Logs

steps:
  - grok:
      patterns:
        - fieldName: origin.ip
          pattern: '{{.ipv4}}'
        - fieldName: log.method
          pattern: '{{.word}}'
        - fieldName: origin.path
          pattern: '{{.data}}'
        - fieldName: log.statusCode
          pattern: '{{.integer}}'
      source: log.message
  - cast:
      fields: [log.statusCode]
      to: int
  - add:
      function: 'string'
      params:
        key: actionResult
        value: 'success'
      where: safe(log.statusCode, 0) >= 200 && safe(log.statusCode, 0) < 300

Pattern 2: Syslog Parsing

steps:
  - grok:
      patterns:
        - fieldName: deviceTime
          pattern: '{{.monthName}}\s+{{.monthDay}}\s+{{.time}}'
        - fieldName: origin.host
          pattern: '{{.word}}'
        - fieldName: log.program
          pattern: '{{.word}}'
        - fieldName: log.message
          pattern: '{{.greedy}}'
      source: raw
  - reformat:
      fields: [deviceTime]
      function: time
      fromFormat: 'Jan 02 15:04:05'
      toFormat: '2006-01-02T15:04:05Z'

Pattern 3: JSON with Nested Data

steps:
  - json:
      source: raw
  - expand:
      source: log.metadata
      to: log.expandedMetadata
      where: exists(log.metadata)
  - rename:
      from: [log.expandedMetadata.userId]
      to: origin.user
  - delete:
      fields: [log.metadata]
      where: exists(log.expandedMetadata)

Troubleshooting

Filter Not Processing

Check: Event has correct dataType field matching filter configuration

Fields Not Extracted

Check: Field names in grok patterns match exactly, patterns are correct

Type Conversion Errors

Check: Field exists before casting, target type is appropriate

Performance Issues

Check: Remove unnecessary fields early, use conditional steps, optimize grok patterns

Getting Started

Installation

Rules and Filters

What are Filters?

Purpose

Filter Structure

Filter Steps Reference

Parsing Steps

Transformation Steps

Enrichment Steps

Cleanup Steps

Quick Start Example

Development Workflow

Best Practices

Common Patterns

Pattern 1: Web Server Logs

Pattern 2: Syslog Parsing

Pattern 3: JSON with Nested Data

Troubleshooting

Filter Not Processing

Fields Not Extracted

Type Conversion Errors

Performance Issues

Getting Started

Installation

Rules and Filters

​What are Filters?

​Purpose

​Filter Structure

​Filter Steps Reference

​Parsing Steps

​Transformation Steps

​Enrichment Steps

​Cleanup Steps

​Quick Start Example

​Development Workflow

​Best Practices

​Common Patterns

​Pattern 1: Web Server Logs

​Pattern 2: Syslog Parsing

​Pattern 3: JSON with Nested Data

​Troubleshooting

​Filter Not Processing

​Fields Not Extracted

​Type Conversion Errors

​Performance Issues

What are Filters?

Purpose

Filter Structure

Filter Steps Reference

Parsing Steps

Transformation Steps

Enrichment Steps

Cleanup Steps

Quick Start Example

Development Workflow

Best Practices

Common Patterns

Pattern 1: Web Server Logs

Pattern 2: Syslog Parsing

Pattern 3: JSON with Nested Data

Troubleshooting

Filter Not Processing

Fields Not Extracted

Type Conversion Errors

Performance Issues