Race Condition in Concurrent Counter Updates: A Real-World Example

As senior engineers, we’ve all encountered subtle bugs that only manifest under high concurrency. Today, I want to share a particularly sneaky race condition I discovered in a counter update system—one that involved Redis, database updates, and the dangerous assumption that “latest” means “latest”.

The Problem:

Let’s start with the seemingly innocent code that caused our race condition. I’m using Ruby here, but you’ll encounter this same race condition in Python, Java, Go, or any language with concurrent execution.

post_counter = redis_post_counter.incr
User.update(post_counter: post_counter)

At first glance, this looks reasonable:

  1. Increment the counter in Redis (our fast, atomic data store) and directly assign the counter to the variable
  2. Update the user’s post counter in the database

Redis guarantees that the increment operation is thread-safe, but that guarantee doesn’t extend to what you do with the value afterward.

The Race Condition Explained

Here’s what actually happens when 5 concurrent requests hit this code:

Timeline of Execution

Time →

Request 1: redis.incr → gets 1 ────────────────────────────→ DB.update(1) ✓
Request 2: redis.incr → gets 2 ──────────────→ DB.update(2) ✓
Request 3: redis.incr → gets 3 ────────→ DB.update(3) ✓
Request 4: redis.incr → gets 4 ──────────────────→ DB.update(4) ✓
Request 5: redis.incr → gets 5 ──→ DB.update(5) ✓

In the diagram above, each request correctly receives a unique counter value from Redis. The problem isn’t with Redis—it’s with the database updates.

The Critical Issue

The database updates don’t execute in the same order as the Redis increments. Here’s a nightmare scenario:

Process 1: redis.incr → post_counter = 1
Process 2: redis.incr → post_counter = 2
Process 3: redis.incr → post_counter = 3
Process 4: redis.incr → post_counter = 4
Process 5: redis.incr → post_counter = 5

# Database updates happen in a different order:
Process 5: User.update(post_counter: 5) ✓
Process 3: User.update(post_counter: 3) ✗
Process 2: User.update(post_counter: 2) ✗  
Process 4: User.update(post_counter: 4) ✗  
Process 1: User.update(post_counter: 1) ✗  

The final database value is 1, even though 5 posts were actually created.

This happens because:

  • Network latency varies between requests
  • Database connection pool timing differs
  • Process scheduling is non-deterministic
  • The variable post_counter captured the value at the time of increment, not at the time of update

Stale Data

The fundamental problem is that we’re storing the counter value in a local variable:

post_counter = redis_post_counter.incr  # Captures value at time T
# ... time passes, other requests complete ...
User.update(post_counter: post_counter)  # Uses stale value from time T

Between the Redis increment and the database update, the world has moved on. Our post_counter variable is a snapshot of the past, not the present.

The Solution: Always Fetch Latest

The fix is elegantly simple—never trust a stored value when the source of truth is elsewhere:

redis_post_counter.incr                        # Increment atomically
User.update(post_counter: redis_post_counter.get)  # Fetch latest value

Why This Works

Process 1: redis.incr (counter = 1)
Process 2: redis.incr (counter = 2)
Process 3: redis.incr (counter = 3)
Process 4: redis.incr (counter = 4)
Process 5: redis.incr (counter = 5)

# Regardless of database update order:
Process 5: User.update(redis.get) → gets 5, updates to 5 ✓
Process 3: User.update(redis.get) → gets 5, updates to 5 ✓
Process 2: User.update(redis.get) → gets 5, updates to 5 ✓
Process 4: User.update(redis.get) → gets 5, updates to 5 ✓
Process 1: User.update(redis.get) → gets 5, updates to 5 ✓

Final database value: 5 (correct!)

Even if Process 1 executes its database update last, it fetches the current counter value (5) from Redis, not the stale value it captured earlier (1).

Key Takeaways

  1. Atomic operations solve one problem, not all problems: Redis’s atomic incr prevents duplicate counter values, but doesn’t prevent race conditions in downstream systems.
  2. Variables capture point-in-time state: Once you store a value in a variable, it’s frozen in time. In concurrent systems, that “time” might be ancient history by the time you use it.
  3. Separate increment from sync: The increment operation (creating new state) and the sync operation (propagating that state) should be treated as distinct concerns.
  4. Fetch, don’t cache: When synchronizing between systems, always fetch the current value from the source of truth rather than relying on cached or captured values.
  5. Last write wins is dangerous: Without proper safeguards, the last database write determines the final state—and in concurrent systems, “last” is arbitrary.

Conclusion

Race conditions in distributed systems are subtle and often counterintuitive. The code looks correct, the atomic operations work perfectly, and yet the system produces wrong results under load.

The lesson here isn’t just about this specific Redis-to-database sync pattern—it’s about understanding that concurrent systems require constant vigilance about data freshness, operation ordering, and the gap between “what happened” and “what we know happened.”

Next time you write code that bridges two systems, ask yourself: “Am I using a value, or am I using a memory of a value?” The difference matters.


Have you encountered similar race conditions in your systems? I’d love to hear about your experiences and solutions in the comments below.

Published by

Leave a Reply

Your email address will not be published. Required fields are marked *