Event Sourcing: Event Versioning

In event sourcing, you can never retire old event versions. Unlike API versioning where the goal is eventual deprecation, events are immutable history. New code must handle events from day one of your system, forever. This makes versioning strategies fundamentally different.

This post summarises what I’ve learned working with an event-sourced system over the past year. Greg Young has an entire book on this topic, but I still feel versioning is not discussed enough in practice.

Why events require permanent compatibility

In API versioning, the goal is eventual retirement: migrate consumers, deprecate old versions, remove them entirely. This is impractical in event sourcing. Events accumulate indefinitely, replay requires processing all historical events, and “migrating” old events requires complex stream transformations. A CustomerCreated event from 2019 must be processable by code written in 2025.

Producers, consumers, and compatibility direction

In event sourcing, producers (command handlers) write events. Consumers (projections, reactors, aggregates) read them. The compatibility direction depends on which side changes first.

  • Consumer changes first (new code reads old events): backward compatibility. This is the replay scenario. It is always required.
  • Producer changes first (old code reads new events): forward compatibility. This is the deployment scenario. During rolling deployments, a new producer may emit events before all consumers have upgraded.

This matters for both internal and external events:

  • For external/integration events, consumers upgrade on their own schedule. Both backward and forward compatibility are essential.
  • For internal/domain events, producers and consumers typically deploy together. But replay still forces new code to process years of old events. And during rolling deployments, producers and consumers may temporarily run different versions.

Backward compatibility is mandatory. Forward compatibility is highly desirable: it enables hot-swapping projectors without replay, canary deployments, and zero-downtime upgrades.

Deployment order follows from this. If you deploy producers first, consumers must be forward compatible. If you deploy consumers first, they only need backward compatibility, which they already require for replay. Deploying consumers first is safer.

Choosing a versioning strategy

Start with one question: can you make the change backward compatible?

Yes: evolve the schema in place (no versioning needed)

Most changes fall here. Add optional fields with sensible defaults. Add new event types for new concepts. Handle missing fields in consumers.

Adding optional fields with defaults:

1type AccountCreated struct {
2    ID         string
3    CustomerID string
4    Source     string  // NEW: defaults to empty string
5}

Adding new event types:

1type AccountSourceUpdated struct {  // NEW EVENT
2    ID     string
3    Source string
4}

Adding required fields with defaults (upcasting):

 1type AccountCreated struct {
 2    ID         string
 3    CustomerID string
 4    Status     AccountStatus  // NEW REQUIRED - default to "Active" when missing
 5}
 6
 7func (h *Handler) Handle(event *es.Event) {
 8    var evt AccountCreated
 9    json.Unmarshal(event.Payload, &evt)
10
11    if evt.Status == "" {
12        evt.Status = AccountStatusActive  // Default for old events
13    }
14}

Never do these (breaking changes):

  • Removing fields: breaks projectors that depend on them
  • Changing field types: old events become unreadable
  • Renaming fields: same as remove + add
  • Changing event semantics: corrupts projections

No: is the change a field type change, rename, or semantic change?

Yes: create a new event type. Don’t version the existing event. Create a complementary one instead.

For example, don’t add Email to CustomerCreated. Emit a separate CustomerEmailUpdated event. Old consumers keep working with old events. New consumers handle both.

No: is there a reasonable default for old events?

Yes: use upcasting. Transform old event shapes to new ones at read time. Consumers only deal with the latest shape.

 1type AccountCreatedV1 struct { ... }
 2type AccountCreatedV2 struct { ... }
 3
 4// Upcaster converts V1 to V2
 5func UpcastAccountCreated(v1 *AccountCreatedV1) *AccountCreatedV2 {
 6    return &AccountCreatedV2{
 7        ID:         v1.ID,
 8        CustomerID: v1.CustomerID,
 9        Source:     "legacy",  // Default for field not in V1
10    }
11}

No replay needed. The transformation happens on-the-fly when events are read from the store.

No reasonable default: use event stream migration (last resort).

  1. Create a new projection with the new schema
  2. Replay all events from start with new handlers
  3. Swap projections atomically when caught up

This is the most disruptive option, but sometimes necessary for major schema overhauls.

Decision matrix

Change Type Backward Compatible? Forward Compatible? Strategy
Add optional field with default Yes Yes Evolve schema in place
Add new event type Yes Yes Safe, just add new event
Remove optional field Yes No Deprecate first, remove after grace period
Add required field With defaults No Upcast or make optional
Change field type No No Create new event type
Rename field No No Treat as remove + add
Change semantics No No Create new event type

Implementing projectors with backward compatibility

Handle missing fields gracefully in projectors and reactors:

 1func (p *AccountProjector) HandleAccountCreated(event *evt.AccountCreated) {
 2    source := event.Source
 3    if source == "" {
 4        source = "unknown"  // Default for old events
 5    }
 6
 7    p.db.Insert(Account{
 8        ID:     event.ID,
 9        Source: source,
10    })
11}

Reactors should follow the same pattern. Since they’re typically idempotent, forward compatibility is less critical (you can replay if needed), but full compatibility still allows for hot-swaps.

Testing compatibility

Test that your projectors and reactors can handle both old and new event schemas:

 1func TestAccountProjector_BackwardCompatibility(t *testing.T) {
 2    t.Run("handles old events without Source field", func(t *testing.T) {
 3        oldEvent := `{"ID":"123","CustomerID":"456"}`  // No Source
 4
 5        var evt evt.AccountCreated
 6        json.Unmarshal([]byte(oldEvent), &evt)
 7
 8        projector.HandleAccountCreated(&evt)
 9
10        account := projector.db.Get("123")
11        require.Equal(t, "unknown", account.Source)  // Default applied
12    })
13
14    t.Run("handles new events with Source field", func(t *testing.T) {
15        newEvent := `{"ID":"123","CustomerID":"456","Source":"web"}`
16
17        var evt evt.AccountCreated
18        json.Unmarshal([]byte(newEvent), &evt)
19
20        projector.HandleAccountCreated(&evt)
21
22        account := projector.db.Get("123")
23        require.Equal(t, "web", account.Source)
24    })
25}

Role of schema registry

For teams working with external/integration events or distributed ownership, a schema registry provides two key benefits:

  1. Validation and compatibility enforcement: automatically validates that producers conform to registered schemas and enforces compatibility rules. This prevents breaking changes from reaching production and enables safe rolling upgrades.

  2. Centralized schema management: provides a single source of truth for event schemas across independent teams, enabling schema discovery, impact analysis, and governance without direct coordination.

When to consider a schema registry

  • Independence of teams: producers and consumers managed by different teams, especially across time zones or organizations
  • Rate of evolution: frequent schema changes that benefit from automated compatibility checks
  • Reliability requirements: critical systems where preventing invalid events is essential
  • External events: integration events exposed to external consumers who upgrade independently

For internal/domain events in a monorepo where producers and consumers share code, the overhead of a schema registry may outweigh its benefits. The decision matrix and testing patterns above provide sufficient guardrails for most teams.

Beyond schema: versioning value object behavior

The strategies above assume event payloads contain pure data structures. What happens when you use value objects with validation logic in your events?

I encountered this issue when a Phone value object’s validation changed over time. Initially it accepted any 8-digit number, but we later tightened validation to only accept Australian country codes (61) since all our customers were from Australia.

 1// Old validation - accepts any 8-digit number
 2func NewPhone(phone string) (*Phone, error) {
 3    if len(phone) != 8 {
 4        return nil, fmt.Errorf("invalid phone")
 5    }
 6    return &Phone{value: phone}, nil
 7}
 8
 9// New validation - only accepts Australia (+61) country code
10func NewPhone(phone string) (*Phone, error) {
11    if !strings.HasPrefix(phone, "61") || len(phone) != 8 {
12        return nil, fmt.Errorf("invalid phone")
13    }
14    return &Phone{value: phone}, nil
15}

During replay, projections failed when loading historical CustomerCreated events because older events contained non-Australian country codes that were valid at the time. The deserialization succeeded, but validation rejected historically valid data.

This reveals a deeper issue: validation logic is a form of schema that evolves over time. Here are three strategies to handle it.

Separate event data from domain behavior. Use plain data types in events, applying validation only when reconstructing domain objects:

 1// Event payload - plain data structure
 2type CustomerCreatedEvent struct {
 3    Phone string // Plain string, no validation
 4}
 5
 6// Domain model uses value object
 7type Customer struct {
 8    phone *Phone
 9}
10
11func NewCustomer(event CustomerCreatedEvent) (*Customer, error) {
12    phone, err := NewPhone(event.Phone) // Validated at construction
13    if err != nil {
14        return nil, err
15    }
16    return &Customer{phone: phone}, nil
17}

Tradeoffs: events capture a snapshot of data as it was at emission time. You lose compile-time safety in the event layer, and current validation might reject historical data during domain reconstruction (which you must handle gracefully).

Strategy 2: Lazy validation

Delay validation until domain operations, skipping validation during deserialization from events:

1func NewPhoneWithSkip(phone string, skipValidation bool) (*Phone, error) {
2    if !skipValidation && !strings.HasPrefix(phone, "61") {
3        return nil, fmt.Errorf("invalid phone")
4    }
5    return &Phone{value: phone}, nil
6}
7
8// Use for loading from event store
9phone, _ := NewPhoneWithSkip(eventData.Phone, true)

Tradeoffs: flexible and preserves domain guarantees for new operations, but requires discipline to use the correct constructor in different contexts (event loading vs. new operations). Easy to misuse.

Strategy 3: Version value objects

Version value objects alongside events. When validation logic changes, create new versions of affected value objects and events.

Tradeoffs: most type-safe approach, but leads to version explosion as every value object change cascades to all events using it. Rarely practical in production systems.

Key principles

  1. Events are contracts. Treat event schemas as public APIs that must be stable.
  2. Backward compatibility is mandatory. Always required for replay. New code must handle old events.
  3. Deploy consumers first. They already need backward compatibility for replay, making this the safer order.
  4. Test with old events. Include tests with historical event shapes to catch compatibility regressions.
  5. Prefer expansion over modification. Add new events rather than changing existing ones.
  6. Version as a last resort. Use explicit versions only when breaking changes are unavoidable.
  7. Be conservative in what you emit, liberal in what you accept.