How to tame race conditions in read models
Our Event-Driven Architecture (EDA) looks flawless on the whiteboard. We implement the Outbox Pattern, use reliable queues, and our business logic resembles a Swiss watch. And then... production hits. It turns out that our architecture becomes a three-dimensional tangle where events not only arrive late, but also out of order. You know the pain, right? Sometimes we think Event Sourcing is a cure-all, but even it throws exceptions when it tries to apply preliminary risk assessment to a project that doesn't yet exist in the system.
What we perceive as an error – e.g. the ProjectScoreCalculated event arriving before ProjectInitiated – is in fact a cost of efficiency. Project initiation, preliminary technical risk assessment, human resource allocation – all of this happens in parallel because it's faster that way. You can't yell at the cloud to stop. You have to adapt.
Today, we're going to talk about a simple but powerful technique that allows you to live with this: Phantom Records in Read Models as your personal Anti-Corruption Layer (ACL).
Events or rumours? That is the question
The events we publish are facts to us – they represent the state of our knowledge. But what about events coming from outside our module? These are rumours at best. They arrive in the order in which we received them, not in the order in which they were generated.
Imagine that you have a system that verifies a new project in terms of resources:
- The Risk Assessment Service publishes:
ProjectScoreCalculated(Score: 9/10, Low technical risk). - The Main Project Management System publishes:
ProjectInitiated(Name: ‘New Platform’, Budget: PLN 500k). - Next comes:
ResourceAvailabilityChecked(Required resources: available).
In an ideal world, 2. would arrive before 1. and 3. But what if 1. arrives first? Your elegant aggregator will throw an exception because there is no project with that ID yet! Our job is to denoise these rumours and transform them into our internal facts.
Instead of fighting for perfect order, we embrace chaos and store data as it comes in.
Read model as an anti-corruption layer (ACL)
Read Model is not just a display format. It can serve as an ACL, protecting your domain from external clutter. We define a document that represents the state of data collection about a project, where all external data is optional.
In Ruby, it might look like this:
# The ProjectVerification is our "Phantom Record"
class ProjectVerification < ApplicationRecord
# project_id must be present for correlation
attribute :project_id, :uuid
# Data from external/parallel systems is optional until it arrives
attribute :project_name, :string # From ProjectInitiated
attribute :budget, :decimal # From ProjectInitiated
attribute :initiated_at, :datetime
attribute :score, :integer # From ProjectScoreCalculated
attribute :risk_level, :string # 'low', 'medium', 'high'
attribute :score_calculated_at, :datetime
attribute :resources_available, :boolean # From ResourceAvailabilityChecked
attribute :resources_checked_at, :datetime
# Our internal state distilled from external "rumours"
attribute :status, :string, default: 'data_gathering' # 'data_gathering', 'approved', 'rejected'
attribute :internal_decision, :string # 'approved' or 'rejected'
end
# Example event handler in Ruby: ProjectScoreCalculated
class ProjectScoreCalculatedHandler
def handle(event)
# Find the document, or create a 'Phantom Record' if it doesn't exist yet
verification = ProjectVerification.find_or_initialize_by(project_id: event.data[:project_id])
# Check for idempotency/staleness using the logical timestamp
if verification.score_calculated_at.present? && event.data[:calculated_at] < verification.score_calculated_at
# comment: Skip older event based on calculated_at timestamp
return
end
# Update the partial state (Project details might still be missing!)
verification.score = event.data[:score]
verification.risk_level = event.data[:risk_level]
verification.score_calculated_at = event.data[:calculated_at]
# Immediate decision can be made with partial data (e.g., immediate rejection on extreme risk)
if event.data[:risk_level] == 'extreme'
verification.status = 'rejected'
verification.internal_decision = 'rejected'
end
verification.save!
end
endWhat if project_score_calculated comes first?
This is the key: if ProjectScoreCalculatedHandler runs first and the ProjectVerification document does not exist, it simply creates it (function find_or_initialise_by). We have a Phantom Record that only has project_id and preliminary risk assessment data. The project details (name, budget) are still nil. The system knows that something is happening and that it already has a preliminary assessment.
When ProjectInitiated arrives, its handler (e.g., ProjectInitiatedHandler) will simply fill in the missing fields in the same document:
class ProjectInitiatedHandler
def handle(event)
verification = ProjectVerification.find_or_initialize_by(project_id: event.data[:project_id])
# comment: Ignore if project details are already present (Idempotency)
return if verification.project_name.present?
# Fill in the project details
verification.project_name = event.data[:name]
verification.budget = event.data[:budget]
verification.initiated_at = event.data[:initiated_at]
# The score data might already be there from an earlier event!
verification.save!
end
endIn this way, each event can operate independently, building the state incrementally, regardless of order.
Beyond projection: fact retrieval and processes
We are reaching a subtle boundary. The projection is meant to interpret and store data. The business logic that makes the final decisions belongs to the Process Manager or Aggregate in your domain. In our case, once we have gathered both score (risk assessment) and resources_available (resource availability), this is the moment when we can emit a new internal event, e.g. ProjectDataGatheringCompleted. This is the moment when rumours turn into facts in our domain.
We can create a function (part of our Process Manager) that is called after each critical event and attempts to complete the data:
# In a separate Process Manager or Service
class ProjectVerificationProcessManager
# comment: This method is called after ProjectScoreCalculated or ResourceAvailabilityChecked updates the Read Model
def try_complete_verification(project_id)
verification = ProjectVerification.find_by(project_id: project_id)
# comment: Ignore if decision was already made
return if verification.internal_decision.present?
# Check if we have all necessary pieces of information for the final decision
if verification.score.present? && verification.resources_available.present?
# Decision logic: Reject if resources unavailable OR high risk score
if verification.resources_available == false
new_status = 'rejected'
reason = 'Resources unavailable'
elsif verification.score < 5 # Example: Score below 5 is high risk
new_status = 'rejected'
reason = 'High technical risk score'
else
new_status = 'approved'
reason = 'Verified and ready for initiation'
end
# Update the Read Model with the final decision
verification.update!(status: new_status, internal_decision: new_status)
# Publish the internal event to the rest of the domain
publish_event(ProjectDataGatheringCompleted.new(project_id: project_id, approval: new_status, reason: reason))
end
end
endIn this way, we organise external chaos by sequencing it based on the order of our internal observations. We achieve complete transparency.
When it works (and when it doesn't)
It works when:
- You tolerate eventual consistency: A brief
data_gatheringstate on the dashboard in exchange for better system responsiveness is acceptable. - Business logic accepts partial data: A project can have a preliminary resource reservation status even if its full, formal initiation has not yet arrived.
- You have clear conflict resolution rules: If
ProjectScorecomes with information about extreme risk after other modules have already started preliminary actions, that risk always invalidates further steps.
Does not work when:
- Immediate Consistency is Required: Legal, financial, or critical safety systems often cannot tolerate even milliseconds of uncertainty.
- Decisions cannot be made without complete data: Some decisions require complete, consistent analysis.
- The cost of rollback is high: If you cannot undo an action, you must enforce stronger ordering guarantees.
Embrace chaos...
External events are rumours. Your Projection (Phantom Record) is a tool for collecting them. Your try_complete_verification function is a deduction that distils an internal fact (a new, clean event) from the collected rumours.
You cannot change the topology of external systems. You must adapt on your side. Store data as it comes in. Build your state incrementally. Make decisions based on the available data.
Happy eventing!