Event schema governance is the practice of defining, enforcing, and maintaining standards for your event data. Without governance, your analytics data quickly becomes a mess of inconsistent event names, missing properties, and undocumented schemas that nobody trusts.
This guide covers the essential practices for establishing schema governance that scales with your organization.
Why Schema Governance Matters
Poor data quality is the root cause of most analytics failures. Schema governance addresses this by establishing clear standards and processes that ensure data meaning, data quality, and data trust across your organization.
The Cost of Poor Governance
- Wasted engineering time: Debugging data quality issues instead of building features
- Unreliable reports: Stakeholders lose trust when numbers don't match
- Duplicate efforts: Teams create overlapping events without coordination
- Technical debt: Legacy event schemas that nobody understands
- Compliance risk: PII in unexpected places creates GDPR/CCPA/CPRA exposure
- Failed integrations: Inconsistent schemas break downstream consumers and third-party tools
- Delayed insights: Data teams spend weeks cleaning and reformatting data instead of generating insights
Benefits of Strong Governance
- Data trust: Stakeholders confidently use analytics for decisions
- Faster development: Clear contracts between teams reduce back-and-forth
- Easier onboarding: New team members understand the data quickly
- Reduced maintenance: Consistent patterns simplify tooling and queries
- Compliance confidence: Know exactly what data you collect and where
- Seamless interoperability: Systems communicate reliably through shared schema contracts
- Self-service analytics: Well-documented schemas enable teams to discover and use data independently
Naming Conventions
Consistent naming is the foundation of good governance. Establish conventions before you track your first event.
Event Naming Standards
Recommended: Object-Action Pattern
Structure events as [Object] [Action] where the object is the entity and action is what happened. This pattern is widely adopted across event-driven architectures and analytics platforms:
User Signed Up- not "signup" or "user_signup"Order Completed- not "purchase" or "checkout_complete"Feature Flag Evaluated- not "ff_eval" or "flag_check"Dashboard Viewed- not "view_dashboard" or "dashboard_view"Cart Item Added- not "add_to_cart" or "cartAdd"Payment Failed- not "payment_error" or "paymentFailure"
Naming Rules:
- Use Title Case: "User Signed Up" not "user signed up" or "USER_SIGNED_UP"
- Use past tense: "Created" not "Create" (events are things that happened)
- Be specific: "Signup Button Clicked" not just "Button Clicked"
- No abbreviations: "Configuration" not "config", "User" not "usr"
- No platform prefixes: Don't use "web_" or "ios_" — use properties instead
- Avoid overly generic names: Never use names like "Event1" or "Action"
- Keep it descriptive and concise: Names should reflect the action or state change they represent
Property Naming Standards
Use snake_case for all properties to ensure database compatibility across systems like BigQuery, Snowflake, and Redshift:
// Good
{
"plan_type": "enterprise",
"signup_source": "google_ads",
"is_premium": true,
"created_at": "2025-01-15T10:30:00Z",
"item_count": 3
}
// Bad
{
"planType": "enterprise", // camelCase breaks SQL queries
"SignupSource": "google_ads", // PascalCase inconsistent
"Premium": true, // missing is_ prefix for boolean
"createdAt": "2025-01-15" // inconsistent date format, missing time
}
Standard Property Prefixes and Suffixes
is_for booleans:is_premium,is_active,is_verifiedhas_for boolean presence:has_subscription,has_profile_photo_atsuffix for timestamps:created_at,updated_at,completed_at_idsuffix for identifiers:user_id,order_id,session_id_countsuffix for counts:item_count,retry_count_urlsuffix for URLs:page_url,referrer_url_namesuffix for display names:product_name,campaign_name
Standard Date and Time Formats
Use ISO 8601 format consistently for all timestamps:
- Full timestamp:
2025-01-15T10:30:00Z(UTC preferred) - With timezone offset:
2025-01-15T10:30:00-05:00 - Date only:
2025-01-15
Schema Validation
Validation ensures that events conform to their defined schemas before they enter your data pipeline, preventing runtime failures and ensuring data consistency.
JSON Schema Definitions
JSON Schema provides a declarative language for annotating and validating JSON documents. Use the latest stable draft (currently Draft 2020-12) for new schemas:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Order Completed",
"description": "Fired when a customer successfully completes a purchase",
"type": "object",
"required": ["order_id", "total", "currency", "items"],
"properties": {
"order_id": {
"type": "string",
"pattern": "^ORD-[A-Z0-9]{8}$",
"description": "Unique order identifier"
},
"total": {
"type": "number",
"minimum": 0,
"description": "Order total in specified currency"
},
"currency": {
"type": "string",
"enum": ["USD", "EUR", "GBP", "CAD", "AUD"],
"description": "ISO 4217 currency code"
},
"discount_code": {
"type": ["string", "null"],
"description": "Applied discount code, if any"
},
"items": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["product_id", "quantity", "price"],
"properties": {
"product_id": { "type": "string" },
"quantity": { "type": "integer", "minimum": 1 },
"price": { "type": "number", "minimum": 0 }
},
"additionalProperties": false
}
}
},
"additionalProperties": false
}
Validation Strategies
1. Client-Side Validation
- Catch errors early in development
- TypeScript types generated from schemas for compile-time safety
- Development-only warnings for schema violations
- Immediate feedback during implementation
// TypeScript type generated from schema
interface OrderCompletedEvent {
order_id: string;
total: number;
currency: 'USD' | 'EUR' | 'GBP' | 'CAD' | 'AUD';
discount_code?: string | null;
items: Array<{
product_id: string;
quantity: number;
price: number;
}>;
}
// Usage with type safety
function trackOrderCompleted(event: OrderCompletedEvent) {
analytics.track('Order Completed', event);
}
2. Server-Side Validation
- Validate at ingestion point before storage
- Log validation failures for debugging with detailed error messages
- Choose enforcement level: warn, reject, or coerce
- Use high-performance validators like AJV (JavaScript) or jsonschema (Python)
3. Schema Registry Validation
- Centralized schema storage with version control
- Compatibility checks prevent breaking changes
- Integration with data pipelines for automated validation
- Support for multiple serialization formats (Avro, Protobuf, JSON Schema)
4. Data Quality Monitoring
- Track schema compliance rates over time
- Alert on sudden drops in compliance
- Identify sources of validation failures
- Monitor property fill rates and data distributions
Validation Best Practices
- Be specific: Define schemas as strictly as possible to catch more errors
- Use the right version: Specify the JSON Schema draft in the
$schemakeyword - Leverage built-in formats: Use standard formats like
email,date-time,uri - Keep it DRY: Use
$refto reuse common schema definitions - Validate on both sides: Client-side for UX, server-side for security
- Use
additionalProperties: false: Catch unexpected fields to prevent schema drift
Privacy and Compliance
Schema governance plays a critical role in privacy compliance. With regulations like GDPR, CCPA, and CPRA, organizations must carefully control what data they collect and how it's used.
PII Detection and Prevention
Personally Identifiable Information (PII) in analytics data creates significant compliance risk. Implement these safeguards:
- Automated PII scanning: Scan event payloads for patterns matching emails, phone numbers, SSNs, credit card numbers
- Schema-level restrictions: Explicitly prohibit PII fields in analytics schemas
- Data masking: Automatically hash or redact sensitive values at ingestion
- Regular audits: Review event data for unexpected PII exposure
# PII detection rules
pii_patterns:
email:
pattern: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
action: redact
phone:
pattern: "\\+?[0-9]{10,15}"
action: redact
credit_card:
pattern: "[0-9]{13,19}"
action: block
ssn:
pattern: "[0-9]{3}-[0-9]{2}-[0-9]{4}"
action: block
Compliance Requirements by Regulation
GDPR (European Union):
- Explicit consent required before data collection
- Right to access: Users can request their data
- Right to erasure: Users can request data deletion
- Data minimization: Collect only what's necessary
- Purpose limitation: Use data only for stated purposes
CCPA/CPRA (California):
- Right to know what data is collected
- Right to delete personal information
- Right to opt out of data sale/sharing
- Special protections for sensitive personal information
Privacy-Compliant Schema Design
{
"title": "Page Viewed",
"type": "object",
"properties": {
"page_path": {
"type": "string",
"description": "URL path without query parameters"
},
"referrer_domain": {
"type": "string",
"description": "Referrer domain only, not full URL"
},
"session_id": {
"type": "string",
"description": "Anonymized session identifier"
},
"country_code": {
"type": "string",
"description": "Country-level location only"
}
},
"x-pii-fields": [],
"x-consent-required": true,
"x-retention-days": 365
}
Documentation Standards
Documentation makes schemas usable. Without it, even well-designed schemas become tribal knowledge that only a few team members understand.
Event Documentation Template
## Event: Order Completed
**Description:** Fired when a customer successfully completes a purchase.
**Trigger:** After payment confirmation and order creation in database.
**Category:** Conversion
**Owner:** E-commerce Team
**Privacy Classification:** Contains no PII
**Consent Required:** Yes (analytics consent)
### Properties
| Property | Type | Required | Description | Example |
|----------|------|----------|-------------|---------|
| order_id | string | Yes | Unique order identifier (pattern: ORD-XXXXXXXX) | "ORD-A1B2C3D4" |
| total | number | Yes | Order total in specified currency | 149.99 |
| currency | string | Yes | ISO 4217 currency code | "USD" |
| discount_code | string | No | Applied discount code | "SUMMER20" |
| items | array | Yes | List of purchased items (min: 1) | See below |
| payment_method | string | No | Payment method used | "credit_card" |
### Item Object
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| product_id | string | Yes | Product identifier |
| quantity | integer | Yes | Number of units (min: 1) |
| price | number | Yes | Unit price (min: 0) |
### Example Payload
```json
{
"event": "Order Completed",
"properties": {
"order_id": "ORD-A1B2C3D4",
"total": 149.99,
"currency": "USD",
"payment_method": "credit_card",
"items": [
{"product_id": "PROD-001", "quantity": 2, "price": 49.99},
{"product_id": "PROD-002", "quantity": 1, "price": 50.01}
]
}
}
```
### Related Events
- Order Started
- Checkout Step Completed
- Payment Failed
- Refund Requested
### Downstream Consumers
- Revenue Dashboard
- Marketing Attribution Model
- Customer Lifetime Value Pipeline
- Email Trigger Service
Documentation Tools and Practices
- Schema registries: Confluent Schema Registry, AWS Glue Schema Registry, Apicurio Registry for centralized storage with versioning
- Auto-generated docs: Generate documentation directly from JSON Schema definitions
- Living documentation: Sync documentation with actual implementation through CI/CD
- Search and discovery: Make events findable by stakeholders through catalogs
- Lineage tracking: Document how data flows from source to insight
Change Management
Schemas evolve over time. A robust change management process prevents breaking downstream consumers and maintains data integrity.
Types of Schema Changes
Non-Breaking Changes (Safe):
- Adding new optional properties with default values
- Adding new enum values
- Relaxing validation (e.g., making required field optional)
- Expanding numeric ranges
- Adding new event types
- Improving descriptions and documentation
Breaking Changes (Require Migration):
- Removing properties
- Renaming properties or events
- Changing data types
- Making optional fields required
- Restricting enum values
- Changing property semantics
Compatibility Modes
Schema registries typically support these compatibility modes:
- BACKWARD: New schemas can read data written by old schemas (consumers can upgrade first)
- FORWARD: Old schemas can read data written by new schemas (producers can upgrade first)
- FULL: Both backward and forward compatible
- TRANSITIVE: Compatible with all previous versions, not just the immediately prior one
Change Request Process
- Request: Submit change request with business justification and impact analysis
- Review: Schema owner reviews impact on downstream consumers using lineage
- Compatibility Check: Automated validation against compatibility rules
- Approval: Stakeholders sign off on timeline and migration plan
- Implementation: Update schema, documentation, and validation rules
- Communication: Notify all consumers of the change with migration guide
- Deprecation: For breaking changes, maintain both versions during transition
- Monitoring: Track adoption and usage of new schema version
Schema Versioning
Use semantic versioning to communicate the nature of changes:
{
"name": "Order Completed",
"version": "2.1.0",
"compatibility": "BACKWARD",
"deprecated_versions": ["1.0.0", "1.1.0"],
"changelog": [
{
"version": "2.1.0",
"date": "2025-01-15",
"type": "minor",
"changes": ["Added discount_code optional property"]
},
{
"version": "2.0.0",
"date": "2024-10-01",
"type": "major",
"changes": ["Renamed product_price to price (breaking)"],
"migration_guide": "https://docs.example.com/migrations/order-v2"
},
{
"version": "1.1.0",
"date": "2024-06-15",
"type": "minor",
"changes": ["Added payment_method optional property"]
}
]
}
Versioning Semantics:
- MAJOR: Breaking changes that prevent interaction with historical data
- MINOR: Additive changes that consumers can safely ignore
- PATCH: Non-breaking fixes (documentation, bug fixes in validation)
Deprecation Policy
- Announce deprecation at least 30 days before removal
- Track usage of deprecated events/properties via monitoring
- Provide migration guides for breaking changes
- Remove deprecated schemas only when usage reaches zero
- Send automated alerts to owners of downstream assets still using deprecated versions
Quality Monitoring
Continuous monitoring ensures governance standards are maintained over time and catches issues before they impact business decisions.
Key Quality Metrics
Schema Compliance Rate:
-- Calculate schema compliance rate
SELECT
event_type,
COUNT(*) AS total_events,
COUNTIF(is_valid = true) AS valid_events,
ROUND(valid_events / total_events * 100, 2) AS compliance_rate
FROM events_with_validation
WHERE event_time >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY event_type
ORDER BY compliance_rate ASC;
Property Fill Rates (Completeness):
-- Monitor property completeness
SELECT
event_type,
COUNTIF(user_id IS NOT NULL) / COUNT(*) * 100 AS user_id_fill_rate,
COUNTIF(session_id IS NOT NULL) / COUNT(*) * 100 AS session_id_fill_rate,
COUNTIF(page_path IS NOT NULL) / COUNT(*) * 100 AS page_path_fill_rate
FROM events
WHERE event_time >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY event_type;
Data Freshness:
-- Monitor data freshness by source
SELECT
source,
MAX(event_time) AS last_event_time,
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), MAX(event_time), MINUTE) AS minutes_since_last
FROM events
GROUP BY source
HAVING minutes_since_last > 60;
Core Data Quality Dimensions
Monitor these fundamental dimensions of data quality:
- Accuracy: Does data reflect real-world objects and events correctly?
- Completeness: Are all required records and values present?
- Consistency: Is data uniform across different systems and sources?
- Timeliness: Is data sufficiently up-to-date for its intended use?
- Validity: Does data conform to business rules and allowable parameters?
- Uniqueness: Are there duplicate representations of the same data?
Automated Quality Checks
- Type consistency: Detect when property types change unexpectedly
- Cardinality monitoring: Alert on unusual value distributions
- Volume anomalies: Detect sudden drops or spikes in event volume
- PII scanning: Identify potential PII in unexpected properties
- Null rate monitoring: Track unexpected increases in null values
- Referential integrity: Verify relationships between events are maintained
Alerting Rules
# Data quality alerts
alerts:
- name: schema_compliance_drop
condition: compliance_rate < 95%
window: 1 hour
severity: warning
notify: [data-team-slack, event-owner]
- name: missing_required_property
condition: fill_rate < 99%
properties: [user_id, event_time]
severity: critical
notify: [pagerduty, data-team-slack]
- name: event_volume_anomaly
condition: volume_change > 50%
window: 1 hour
baseline: 7_day_average
severity: warning
- name: data_freshness_violation
condition: minutes_since_last_event > 60
severity: critical
notify: [pagerduty]
- name: pii_detected
condition: pii_scan_matches > 0
severity: critical
notify: [security-team, compliance-team]
- name: type_drift_detected
condition: type_mismatch_rate > 0.1%
severity: warning
notify: [data-team-slack]
Data Quality Tools
Consider these tools for automated data quality monitoring:
- Great Expectations: Open-source data validation with schema expectations
- dbt tests: Built-in data quality testing for transformation pipelines
- Monte Carlo: ML-based data observability platform
- Soda: Data quality checks with SodaCL language
- Datafold: Data diff and quality monitoring
Schema Registry and Tooling
A schema registry is a centralized repository that stores and manages schemas, enabling version control, compatibility validation, and governance at scale.
Popular Schema Registries
- Confluent Schema Registry: Industry standard for Kafka environments, supports Avro, Protobuf, and JSON Schema
- AWS Glue Schema Registry: Cloud-native solution with tight AWS integration
- Apicurio Registry: Open-source alternative for multi-format schema management
- Azure Schema Registry: For Azure Event Hubs integration
Schema Registry Benefits
- Single source of truth: All schemas in one discoverable location
- Version history: Track all changes to schemas over time
- Compatibility enforcement: Automatically reject incompatible changes
- Code generation: Generate typed clients from schemas
- Documentation: Auto-generated docs from schema definitions
- Governance workflows: Approval processes for schema changes
Serialization Format Comparison
| Format | Best For | Schema Evolution | Human Readable |
|---|---|---|---|
| JSON Schema | Web APIs, analytics | Good with validation | Yes |
| Avro | Internal pipelines | Excellent | No (binary) |
| Protobuf | Low-latency RPC | Good | No (binary) |
Governance at Scale: Federated Governance
For large organizations with multiple teams or subsidiaries, federated governance balances central standards with team autonomy.
Federated Governance Model
- Central standards: Naming conventions, required properties, privacy rules
- Local ownership: Teams own their domain-specific schemas
- Shared entities: Common definitions for users, sessions, products
- Interoperability rules: Standards that enable cross-team data sharing
Common Entity Definitions
Define shared entities that multiple teams reference:
{
"$id": "https://schemas.company.com/entities/user.json",
"title": "User Entity",
"description": "Standard user identification across all events",
"type": "object",
"properties": {
"user_id": {
"type": "string",
"description": "Persistent user identifier"
},
"anonymous_id": {
"type": "string",
"description": "Anonymous identifier for non-authenticated users"
},
"session_id": {
"type": "string",
"description": "Current session identifier"
}
}
}
Governance Roles and Responsibilities
Clear ownership prevents governance from becoming everyone's job (and therefore nobody's job).
Role Definitions
Schema Owner / Data Steward:
- Approves new events and schema changes
- Maintains documentation standards
- Reviews and approves change requests
- Enforces governance policies
- Typically: Data/Analytics team lead or Data Governance lead
Event Owners:
- Own specific events or event categories
- Responsible for documentation accuracy
- First point of contact for questions
- Coordinate with consumers on changes
- Typically: Product or engineering team members
Implementers:
- Implement tracking according to schema
- Report schema issues during development
- Follow naming conventions and standards
- Write tests validating schema compliance
- Typically: Frontend and backend engineers
Consumers:
- Use event data for analysis and reporting
- Report data quality issues
- Request new events or properties
- Provide feedback on schema usability
- Typically: Product managers, analysts, data scientists
Governance Review Cadence
- Weekly: Review new event requests and pending changes
- Monthly: Review data quality metrics and compliance trends
- Quarterly: Audit unused events and deprecated schemas
- Annually: Review and update governance policies
Implementation Checklist
Use this checklist to implement schema governance in your organization:
- Establish naming conventions
- Document event naming pattern (Object-Action recommended)
- Document property naming standards (snake_case)
- Create list of standard properties and prefixes
- Define date/time format standards (ISO 8601)
- Set up schema validation
- Choose validation tool (AJV, jsonschema) or schema registry
- Define enforcement level per environment (warn in dev, reject in prod)
- Create process for adding new schemas
- Implement CI/CD integration for schema validation
- Implement privacy controls
- Define PII detection rules
- Document consent requirements per event
- Set up automated PII scanning
- Create data retention policies
- Create documentation system
- Choose documentation platform or schema registry
- Create templates for event documentation
- Document existing events
- Set up auto-generation from schemas
- Define change management process
- Create change request template
- Define approval workflow
- Establish deprecation policy (30+ day notice)
- Set up compatibility checking
- Implement quality monitoring
- Set up compliance tracking
- Configure alerting rules
- Create quality dashboards
- Define SLAs for data quality metrics
- Assign ownership
- Designate schema owner(s)
- Assign event owners by domain
- Communicate responsibilities
- Establish review cadence
Summary
Effective schema governance requires:
- Clear standards: Naming conventions everyone follows consistently
- Validation: Automated enforcement of schema compliance at ingestion
- Privacy compliance: PII detection, consent management, and regulatory adherence
- Documentation: Schemas that are discoverable, understandable, and up-to-date
- Change management: Controlled evolution with compatibility rules and migration paths
- Quality monitoring: Continuous visibility into data health across all dimensions
- Clear ownership: Defined roles, responsibilities, and escalation paths
- Tooling: Schema registries and validation libraries that scale with your organization
Start small, be consistent, and iterate. Good governance compounds over time, making your analytics infrastructure more valuable and trustworthy with every event you track. Remember that schema governance is not a one-time project but an ongoing practice that requires continuous attention and improvement.