The Challenge
MarketHub (name changed) is a two-sided marketplace connecting service providers with customers. With 2M monthly active users and 500M events per month, their BigQuery costs were spiraling, and query performance was degrading.
Key Problems
- Slow dashboards: Executive dashboards took 30+ seconds to load
- High costs: BigQuery on-demand pricing hit $15K/month
- Limited real-time: Streaming inserts added latency and cost
- Analyst bottleneck: Complex queries required data team intervention
The Solution
We migrated their analytics warehouse from BigQuery to ClickHouse Cloud, using Jitsu for event ingestion.
Architecture Overview
- Jitsu: Event collection from web, mobile, and backend services
- ClickHouse Cloud: Primary analytics warehouse
- dbt: Data transformations and modeling
- Metabase: Self-serve dashboards for all teams
Implementation Process
Week 1-2: Schema Design
We redesigned their data model for ClickHouse's columnar storage. This included choosing optimal partition keys, sorting keys, and compression codecs for each table type.
Week 3-4: Pipeline Migration
We deployed Jitsu to replace their custom Kafka-to-BigQuery pipeline. Jitsu's built-in schema inference and ClickHouse connector simplified the architecture significantly.
Week 5-6: Historical Backfill
We backfilled 18 months of historical data from BigQuery to ClickHouse using parallel batch exports. Data integrity was validated at each step.
Week 7-8: Dashboard Migration
We rewrote SQL queries for ClickHouse syntax and rebuilt dashboards in Metabase. Most queries were 10-50x faster without optimization.
Technical Details
Table Design: For their main events table (200M rows/month), we used: Partition by: toYYYYMM(event_time), Order by: (user_id, event_time), Compression: LZ4 for real-time, ZSTD for historical.
Query Optimization: Common aggregation queries that took 30 seconds in BigQuery now run in under 1 second. We created materialized views for the most expensive aggregations.
Cost Structure: ClickHouse Cloud's pricing (compute + storage) vs. BigQuery's scan-based pricing resulted in predictable, lower costs at their query volume.
Results
- 10x faster queries: P95 dashboard load time from 30s to 3s
- 65% cost reduction: From $15K to $5K monthly
- Real-time analytics: Sub-second data freshness with Jitsu streaming
- Self-serve analysis: Business teams can now run their own queries
"We were paying BigQuery to scan the same data over and over. ClickHouse's columnar storage and proper indexing means we get answers in seconds, not minutes. The cost savings alone justified the migration."