In event sourcing, you can never retire old event versions. Unlike API versioning where the goal is eventual deprecation, events are immutable history. New code must handle events from day one of your system, forever. This makes versioning strategies fundamentally different.
This post summarises what I’ve learned working with an event-sourced system over the past year. Greg Young has an entire book on this topic, but I still feel versioning is not discussed enough in practice.
Why events require permanent compatibility
In API versioning, the goal is eventual retirement: migrate consumers, deprecate old versions, remove them entirely. This is impractical in event sourcing. Events accumulate indefinitely, replay requires processing all historical events, and “migrating” old events requires complex stream transformations. A CustomerCreated event from 2019 must be processable by code written in 2025.
Producers, consumers, and compatibility direction
In event sourcing, producers (command handlers) write events. Consumers (projections, reactors, aggregates) read them. The compatibility direction depends on which side changes first.
- Consumer changes first (new code reads old events): backward compatibility. This is the replay scenario. It is always required.
- Producer changes first (old code reads new events): forward compatibility. This is the deployment scenario. During rolling deployments, a new producer may emit events before all consumers have upgraded.
This matters for both internal and external events:
- For external/integration events, consumers upgrade on their own schedule. Both backward and forward compatibility are essential.
- For internal/domain events, producers and consumers typically deploy together. But replay still forces new code to process years of old events. And during rolling deployments, producers and consumers may temporarily run different versions.
Backward compatibility is mandatory. Forward compatibility is highly desirable: it enables hot-swapping projectors without replay, canary deployments, and zero-downtime upgrades.
Deployment order follows from this. If you deploy producers first, consumers must be forward compatible. If you deploy consumers first, they only need backward compatibility, which they already require for replay. Deploying consumers first is safer.
Choosing a versioning strategy
Start with one question: can you make the change backward compatible?
Yes: evolve the schema in place (no versioning needed)
Most changes fall here. Add optional fields with sensible defaults. Add new event types for new concepts. Handle missing fields in consumers.
Adding optional fields with defaults:
1type AccountCreated struct {
2 ID string
3 CustomerID string
4 Source string // NEW: defaults to empty string
5}
Adding new event types:
1type AccountSourceUpdated struct { // NEW EVENT
2 ID string
3 Source string
4}
Adding required fields with defaults (upcasting):
1type AccountCreated struct {
2 ID string
3 CustomerID string
4 Status AccountStatus // NEW REQUIRED - default to "Active" when missing
5}
6
7func (h *Handler) Handle(event *es.Event) {
8 var evt AccountCreated
9 json.Unmarshal(event.Payload, &evt)
10
11 if evt.Status == "" {
12 evt.Status = AccountStatusActive // Default for old events
13 }
14}
Never do these (breaking changes):
- Removing fields: breaks projectors that depend on them
- Changing field types: old events become unreadable
- Renaming fields: same as remove + add
- Changing event semantics: corrupts projections
No: is the change a field type change, rename, or semantic change?
Yes: create a new event type. Don’t version the existing event. Create a complementary one instead.
For example, don’t add Email to CustomerCreated. Emit a separate CustomerEmailUpdated event. Old consumers keep working with old events. New consumers handle both.
No: is there a reasonable default for old events?
Yes: use upcasting. Transform old event shapes to new ones at read time. Consumers only deal with the latest shape.
1type AccountCreatedV1 struct { ... }
2type AccountCreatedV2 struct { ... }
3
4// Upcaster converts V1 to V2
5func UpcastAccountCreated(v1 *AccountCreatedV1) *AccountCreatedV2 {
6 return &AccountCreatedV2{
7 ID: v1.ID,
8 CustomerID: v1.CustomerID,
9 Source: "legacy", // Default for field not in V1
10 }
11}
No replay needed. The transformation happens on-the-fly when events are read from the store.
No reasonable default: use event stream migration (last resort).
- Create a new projection with the new schema
- Replay all events from start with new handlers
- Swap projections atomically when caught up
This is the most disruptive option, but sometimes necessary for major schema overhauls.
Decision matrix
| Change Type | Backward Compatible? | Forward Compatible? | Strategy |
|---|---|---|---|
| Add optional field with default | Yes | Yes | Evolve schema in place |
| Add new event type | Yes | Yes | Safe, just add new event |
| Remove optional field | Yes | No | Deprecate first, remove after grace period |
| Add required field | With defaults | No | Upcast or make optional |
| Change field type | No | No | Create new event type |
| Rename field | No | No | Treat as remove + add |
| Change semantics | No | No | Create new event type |
Implementing projectors with backward compatibility
Handle missing fields gracefully in projectors and reactors:
1func (p *AccountProjector) HandleAccountCreated(event *evt.AccountCreated) {
2 source := event.Source
3 if source == "" {
4 source = "unknown" // Default for old events
5 }
6
7 p.db.Insert(Account{
8 ID: event.ID,
9 Source: source,
10 })
11}
Reactors should follow the same pattern. Since they’re typically idempotent, forward compatibility is less critical (you can replay if needed), but full compatibility still allows for hot-swaps.
Testing compatibility
Test that your projectors and reactors can handle both old and new event schemas:
1func TestAccountProjector_BackwardCompatibility(t *testing.T) {
2 t.Run("handles old events without Source field", func(t *testing.T) {
3 oldEvent := `{"ID":"123","CustomerID":"456"}` // No Source
4
5 var evt evt.AccountCreated
6 json.Unmarshal([]byte(oldEvent), &evt)
7
8 projector.HandleAccountCreated(&evt)
9
10 account := projector.db.Get("123")
11 require.Equal(t, "unknown", account.Source) // Default applied
12 })
13
14 t.Run("handles new events with Source field", func(t *testing.T) {
15 newEvent := `{"ID":"123","CustomerID":"456","Source":"web"}`
16
17 var evt evt.AccountCreated
18 json.Unmarshal([]byte(newEvent), &evt)
19
20 projector.HandleAccountCreated(&evt)
21
22 account := projector.db.Get("123")
23 require.Equal(t, "web", account.Source)
24 })
25}
Role of schema registry
For teams working with external/integration events or distributed ownership, a schema registry provides two key benefits:
-
Validation and compatibility enforcement: automatically validates that producers conform to registered schemas and enforces compatibility rules. This prevents breaking changes from reaching production and enables safe rolling upgrades.
-
Centralized schema management: provides a single source of truth for event schemas across independent teams, enabling schema discovery, impact analysis, and governance without direct coordination.
When to consider a schema registry
- Independence of teams: producers and consumers managed by different teams, especially across time zones or organizations
- Rate of evolution: frequent schema changes that benefit from automated compatibility checks
- Reliability requirements: critical systems where preventing invalid events is essential
- External events: integration events exposed to external consumers who upgrade independently
For internal/domain events in a monorepo where producers and consumers share code, the overhead of a schema registry may outweigh its benefits. The decision matrix and testing patterns above provide sufficient guardrails for most teams.
Beyond schema: versioning value object behavior
The strategies above assume event payloads contain pure data structures. What happens when you use value objects with validation logic in your events?
I encountered this issue when a Phone value object’s validation changed over time. Initially it accepted any 8-digit number, but we later tightened validation to only accept Australian country codes (61) since all our customers were from Australia.
1// Old validation - accepts any 8-digit number
2func NewPhone(phone string) (*Phone, error) {
3 if len(phone) != 8 {
4 return nil, fmt.Errorf("invalid phone")
5 }
6 return &Phone{value: phone}, nil
7}
8
9// New validation - only accepts Australia (+61) country code
10func NewPhone(phone string) (*Phone, error) {
11 if !strings.HasPrefix(phone, "61") || len(phone) != 8 {
12 return nil, fmt.Errorf("invalid phone")
13 }
14 return &Phone{value: phone}, nil
15}
During replay, projections failed when loading historical CustomerCreated events because older events contained non-Australian country codes that were valid at the time. The deserialization succeeded, but validation rejected historically valid data.
This reveals a deeper issue: validation logic is a form of schema that evolves over time. Here are three strategies to handle it.
Strategy 1: Plain data structures for events (Recommended)
Separate event data from domain behavior. Use plain data types in events, applying validation only when reconstructing domain objects:
1// Event payload - plain data structure
2type CustomerCreatedEvent struct {
3 Phone string // Plain string, no validation
4}
5
6// Domain model uses value object
7type Customer struct {
8 phone *Phone
9}
10
11func NewCustomer(event CustomerCreatedEvent) (*Customer, error) {
12 phone, err := NewPhone(event.Phone) // Validated at construction
13 if err != nil {
14 return nil, err
15 }
16 return &Customer{phone: phone}, nil
17}
Tradeoffs: events capture a snapshot of data as it was at emission time. You lose compile-time safety in the event layer, and current validation might reject historical data during domain reconstruction (which you must handle gracefully).
Strategy 2: Lazy validation
Delay validation until domain operations, skipping validation during deserialization from events:
1func NewPhoneWithSkip(phone string, skipValidation bool) (*Phone, error) {
2 if !skipValidation && !strings.HasPrefix(phone, "61") {
3 return nil, fmt.Errorf("invalid phone")
4 }
5 return &Phone{value: phone}, nil
6}
7
8// Use for loading from event store
9phone, _ := NewPhoneWithSkip(eventData.Phone, true)
Tradeoffs: flexible and preserves domain guarantees for new operations, but requires discipline to use the correct constructor in different contexts (event loading vs. new operations). Easy to misuse.
Strategy 3: Version value objects
Version value objects alongside events. When validation logic changes, create new versions of affected value objects and events.
Tradeoffs: most type-safe approach, but leads to version explosion as every value object change cascades to all events using it. Rarely practical in production systems.
Key principles
- Events are contracts. Treat event schemas as public APIs that must be stable.
- Backward compatibility is mandatory. Always required for replay. New code must handle old events.
- Deploy consumers first. They already need backward compatibility for replay, making this the safer order.
- Test with old events. Include tests with historical event shapes to catch compatibility regressions.
- Prefer expansion over modification. Add new events rather than changing existing ones.
- Version as a last resort. Use explicit versions only when breaking changes are unavoidable.
- Be conservative in what you emit, liberal in what you accept.