Jitsu is an open-source event collection platform and Segment alternative that helps you build robust data pipelines from your applications to your data warehouse. Unlike managed solutions, Jitsu gives you full control over your data while providing enterprise-grade features for schema management, transformations, and delivery guarantees. Backed by Y Combinator (S20) and licensed under MIT, Jitsu is designed for modern data teams who need real-time data pipelines.

This guide covers everything you need to deploy Jitsu and build production-ready event pipelines.

What is Jitsu?

Jitsu is a modern event collection and routing platform with several key capabilities:

  • Event collection: SDKs for web, mobile, and server-side event tracking
  • Real-time processing: Transform and enrich events as they flow through using JavaScript functions
  • Multiple destinations: Route events to warehouses (ClickHouse, BigQuery, Snowflake, Postgres, Redshift), analytics tools, and custom endpoints
  • Schema management: Automatic schema inference with JSON flattening and type detection
  • Self-hosted: Run on your infrastructure with full data control, or use Jitsu Cloud (free up to 200k events/month)
  • Identity stitching: Automatic real-time identity graph construction

Architecture Overview

Jitsu 2.0 consists of several specialized components:

  • Console: Web UI for configuration and monitoring, built with Next.js
  • Ingest: Events ingestion API that receives and validates incoming events, written in Go
  • Rotor: Function execution engine that runs JavaScript transformations, built with Node.js
  • Bulker: Batch loading service for warehouse destinations with automatic schema management, written in Go
  • Syncctl: Connector orchestration service for running source syncs (requires Kubernetes)

Supporting services include PostgreSQL (configuration storage), Redis or MongoDB (caching and identity stitching), and optionally Kafka (message queuing for high-volume deployments).

Deployment Options

Jitsu can be deployed in several ways depending on your scale and operational requirements.

Docker Compose (Development/Small Scale)

The fastest way to install Jitsu is using Docker Compose:

# Clone the repository
git clone --depth 1 https://github.com/jitsucom/jitsu
cd jitsu/docker

# Copy and configure environment variables
cp .env.example .env
# Edit .env file with your settings

# Start Jitsu
docker compose up -d

Key environment variables to configure in .env:

# Required tokens (generate unique values)
CONSOLE_TOKEN=your-console-token
BULKER_TOKEN=your-bulker-token

# Database credentials
POSTGRES_PASSWORD=your-postgres-password
REDIS_PASSWORD=your-redis-password

# Optional: seed admin user
SEED_USER_EMAIL=admin@example.com
SEED_USER_PASSWORD=your-admin-password

The default Docker Compose setup includes all necessary services: Console, Ingest, Rotor, Bulker, PostgreSQL, Redis, and optionally Kafka and ClickHouse.

Kubernetes (Production)

For production deployments, Jitsu can run on Kubernetes. Note that there is no official Helm chart maintained by Jitsu. However, community-maintained options exist:

# Community Helm chart (stafftastic/jitsu-chart)
helm install jitsu oci://registry-1.docker.io/stafftasticcharts/jitsu \
  --namespace analytics \
  --create-namespace \
  -f values.yaml

Alternatively, you can deploy individual components using the official Docker images:

  • jitsucom/console - Management UI
  • jitsucom/ingest - Event ingestion API
  • jitsucom/rotor - Function execution
  • jitsucom/bulker - Warehouse loading
  • jitsucom/syncctl - Connector orchestration

Key production considerations:

  • Use managed databases (RDS, Cloud SQL) for PostgreSQL
  • Use managed Redis (ElastiCache, Cloud Memorystore) or MongoDB Atlas
  • Configure horizontal pod autoscaling based on CPU/memory
  • Set up ingress with TLS termination
  • Syncctl requires a Kubernetes cluster to run connector sync jobs

Jitsu Cloud

A managed cloud version is available at use.jitsu.com:

  • Free up to 200,000 events per month
  • Includes a free ClickHouse instance
  • No infrastructure management required

Resource Sizing

  • Small (< 1M events/day): 2 CPU, 4GB RAM per service
  • Medium (1-10M events/day): 4 CPU, 8GB RAM, 2+ replicas
  • Large (10M+ events/day): 8 CPU, 16GB RAM, 4+ replicas with autoscaling, Kafka recommended

SDK Integration

Jitsu provides SDKs for multiple platforms. Note that Jitsu 2.0 uses different packages than Jitsu Classic.

JavaScript SDK (Web) - Jitsu 2.0

// Install via npm
npm install @jitsu/js

// Initialize the client
import { jitsuAnalytics } from '@jitsu/js';

const jitsu = jitsuAnalytics({
  host: 'https://your-jitsu-instance.com',
  writeKey: 'your-write-key',
  debug: process.env.NODE_ENV === 'development'
});

// Track events
jitsu.track('Button Clicked', {
  button_id: 'signup-cta',
  page_path: window.location.pathname
});

// Identify users
jitsu.identify('user_123', {
  email: 'user@example.com',
  plan: 'pro',
  created_at: '2025-01-15T10:30:00Z'
});

// Track page views
jitsu.page({
  title: document.title,
  url: window.location.href,
  referrer: document.referrer
});

HTML Snippet (No Build Step)

<script src="https://your-jitsu-instance.com/s/lib.js"
        data-write-key="your-write-key"></script>
<script>
  // jitsu is available globally
  jitsu.track('Page Loaded');
</script>

React Integration

// Install packages
npm install @jitsu/js @jitsu/react

// Using the React provider
import { JitsuProvider, useJitsu } from '@jitsu/react';
import { jitsuAnalytics } from '@jitsu/js';

const jitsuClient = jitsuAnalytics({
  host: 'https://your-jitsu-instance.com',
  writeKey: 'your-write-key'
});

function App() {
  return (
    <JitsuProvider client={jitsuClient}>
      <YourApp />
    </JitsuProvider>
  );
}

function SignupButton() {
  const { track } = useJitsu();

  const handleClick = () => {
    track('Signup Button Clicked', {
      location: 'header',
      variant: 'primary'
    });
  };

  return <button onClick={handleClick}>Sign Up</button>;
}

Server-Side SDK (Node.js)

// The @jitsu/js package is isomorphic and works in Node.js
import { jitsuAnalytics } from '@jitsu/js';

const jitsu = jitsuAnalytics({
  host: 'https://your-jitsu-instance.com',
  writeKey: 'server-write-key'
});

// Track server-side events
async function trackPurchase(userId, order) {
  await jitsu.track('Order Completed', {
    user_id: userId,
    order_id: order.id,
    total: order.total,
    currency: order.currency,
    items: order.items.map(item => ({
      product_id: item.productId,
      quantity: item.quantity,
      price: item.price
    }))
  });
}

HTTP API (Any Language)

# Direct HTTP API call
curl -X POST https://your-jitsu-instance.com/api/s/s2s/track \
  -H "Content-Type: application/json" \
  -H "X-Write-Key: your-write-key" \
  -d '{
    "type": "track",
    "event": "API Event",
    "userId": "user_123",
    "properties": {
      "source": "backend",
      "value": 42
    },
    "context": {
      "library": {
        "name": "http-api",
        "version": "1.0.0"
      }
    }
  }'

Segment Proxy Mode

Jitsu can act as a Segment-compatible endpoint, allowing you to migrate from Segment without changing your existing SDK implementation.

Destination Configuration

Jitsu supports multiple destination types configured through the Console UI or API.

Data Warehouse Destinations

ClickHouse (recommended for high-volume analytics):

  • Native batch loading support
  • Automatic table creation and schema evolution
  • Free ClickHouse instance included with Jitsu Cloud

BigQuery:

  • Streaming and batch modes
  • Automatic partitioning by event time
  • Clustering support for query optimization

PostgreSQL:

  • COPY command for efficient batch loading
  • Automatic schema management
  • Good for smaller deployments

Snowflake:

  • Stage-based loading
  • Automatic warehouse scaling support

Amazon Redshift:

  • S3-based batch loading
  • Schema inference and management

Analytics Tool Destinations

  • Amplitude: Forward events for product analytics
  • Mixpanel: Real-time event streaming
  • Google Analytics 4: Server-side measurement protocol
  • Facebook Conversion API: Server-side conversion tracking
  • HubSpot: CRM event integration

Custom Webhooks

Send events to any HTTP endpoint with configurable batching, headers, and retry policies.

Jitsu Functions (Transformations)

Jitsu Functions allow you to transform, filter, or enrich events using JavaScript before they reach destinations.

Function Structure

// Basic function structure
export default async function transform(event, context) {
  // Access context utilities
  const { log, props, store, geo, ua, fetch } = context;

  // Filter: return "drop" to discard event
  if (event.properties?.internal === true) {
    return "drop";
  }

  // Enrich with geo and user agent data
  event.properties = event.properties || {};
  event.properties.country = geo?.country;
  event.properties.browser = ua?.browser?.name;
  event.properties.processed_at = new Date().toISOString();

  // Return modified event
  return event;
}

Context Object

The context object provides several utilities:

  • log: Logging function for debugging
  • props: Custom properties configured for the function
  • store: Persistent key-value storage between function calls
  • geo: Geolocation data based on IP address
  • ua: Parsed user agent information
  • fetch: HTTP client for external API calls

Common Use Cases

// PII Masking
export default async function transform(event, { log }) {
  if (event.properties?.email) {
    event.properties.email_domain = event.properties.email.split('@')[1];
    delete event.properties.email;
  }
  return event;
}

// Event Normalization
export default async function transform(event) {
  // Normalize event types
  if (['page_view', 'pageview', 'pageView'].includes(event.type)) {
    event.type = 'page';
  }
  return event;
}

// Change Destination Table
export default async function transform(event) {
  // Route different event types to different tables
  if (event.type === 'purchase') {
    event.JITSU_TABLE_NAME = 'purchases';
  }
  return event;
}

// Using Persistent Storage
export default async function transform(event, { store }) {
  const userId = event.userId;
  const existingUser = await store.get(`user:${userId}`);

  if (!existingUser) {
    await store.set(`user:${userId}`, { first_seen: new Date().toISOString() });
    event.properties.is_new_user = true;
  }

  return event;
}

NPM Package Support

Functions can import NPM packages. Use the Jitsu CLI to bundle and deploy:

# Create a new function project
npx create-jitsu-app --name my-function

# Build and deploy
npm run build
jitsu-cli login
npm run deploy

Schema Management

Jitsu provides automatic schema management through its Bulker component.

Automatic Schema Inference

  • JSON Flattening: Nested objects like {"a": {"b": 1}} become columns a_b
  • Type Detection: Column types are inferred from JSON values
  • Schema Evolution: New columns are added automatically when new fields appear
  • Array Handling: Arrays are stored as JSON or array types depending on the destination

Explicit Type Hints

// Use __sql_type_ prefix for explicit typing
{
  "revenue": 99.99,
  "__sql_type_revenue": "decimal(10,2)",
  "description": "A short text",
  "__sql_type_description": "varchar(500)"
}

Table Naming

  • Default table name is events
  • Override per-event using JITSU_TABLE_NAME property
  • Configure table name templates in destination settings

Monitoring and Observability

Production deployments require comprehensive monitoring to ensure reliability.

Console Dashboard

The Jitsu Console provides built-in monitoring:

  • Live Events viewer for real-time debugging
  • Event volume metrics per source and destination
  • Function execution logs
  • Error tracking and alerts

Health Endpoints

# Liveness probe
GET /health/live

# Readiness probe
GET /health/ready

# API status
GET /api/status

Prometheus Metrics

Jitsu exposes Prometheus-compatible metrics at /metrics:

  • Events received/processed/failed counters
  • Processing latency histograms
  • Destination queue sizes
  • Function execution metrics

Logging

  • Set log level via environment variable (LOG_LEVEL=info|debug|warn|error)
  • JSON-formatted logs for easy parsing
  • Correlation IDs for request tracing

Error Handling and Reliability

Delivery Guarantees

  • At-least-once delivery: Events are retried until successfully delivered
  • Kafka buffering: High-volume deployments use Kafka for durable message queuing
  • Local buffering: Events are buffered locally if destinations are temporarily unavailable

Retry Behavior

  • Automatic retries with exponential backoff
  • Configurable retry attempts and delays
  • Retryable errors (timeouts, rate limits) vs permanent failures

Dead Letter Handling

Events that fail all retry attempts can be:

  • Logged for manual inspection
  • Sent to a separate destination for analysis
  • Replayed after fixing the underlying issue

Security Considerations

Authentication

  • Use separate write keys for different sources (web, mobile, server)
  • Write keys can be scoped to specific destinations
  • Console access is protected with user authentication
  • API tokens for programmatic access

Data Privacy

  • Use Functions to mask or remove PII before storage
  • IP anonymization support for GDPR compliance
  • Self-hosted deployment keeps all data on your infrastructure
  • Configure data retention policies per destination

Network Security

  • TLS encryption for all connections
  • Custom domain support to minimize ad-blocker impact
  • VPC deployment options for private networks

Migration from Segment

Jitsu is designed as a Segment alternative with migration paths:

Segment Proxy Mode

Point your existing Segment SDK to Jitsu's Segment-compatible endpoint:

analytics.load("your-jitsu-write-key", {
  integrations: {
    "Segment.io": {
      apiHost: "your-jitsu-instance.com/api/s/segment"
    }
  }
});

SDK Migration

Replace Segment SDK with Jitsu SDK—the API is largely compatible:

  • analytics.track()jitsu.track()
  • analytics.identify()jitsu.identify()
  • analytics.page()jitsu.page()

Summary

Building event pipelines with Jitsu provides:

  1. Data ownership: Full control over your event data with self-hosting option
  2. Flexibility: Route to any destination with powerful JavaScript transformations
  3. Reliability: At-least-once delivery with automatic retries and buffering
  4. Scalability: Handle millions of events per day with horizontal scaling
  5. Cost efficiency: MIT-licensed, avoid per-event pricing of managed solutions
  6. Developer experience: Clean APIs, comprehensive SDKs, and debugging tools

Start with Docker Compose for development, migrate to Kubernetes for production, or use Jitsu Cloud for a fully managed experience. With proper monitoring and schema governance, Jitsu becomes the foundation of a reliable, privacy-respecting analytics infrastructure.

Resources