Why does the original cache function have a race condition?

Multiple concurrent coroutines can all pass the `if key in cache` check before any of them writes the result back. Since `await` is a suspension point, the event loop switches between coroutines in the gap between the check and the cache write, causing duplicate fetches and potential stale-data overwrites.

What are the two main approaches to fix this bug?

Per-key locking (using `asyncio.Lock`) ensures only one coroutine fetches per key while others wait. In-flight task tracking stores a pending `Future` or `Task` so concurrent callers await the same fetch result, eliminating duplicates without explicit locks.

Which fix is simpler to understand and maintain?

The in-flight task approach is often simpler: store the `asyncio.create_task(fetcher(key))` in the cache dict itself, and all callers await it. This avoids a separate lock dict and is more idiomatic for async Python.

Can whichever coroutine finishes last cause stale data?

Yes, in the buggy original code. If multiple fetches run concurrently and a slower fetch (or one querying an older data source) completes last, it overwrites the cache with stale data, even though a faster fetch already stored fresher data.

Benchmark: Python race-condition diagnosis across Claude

Claude Sonnet 4.6 and Opus 4.7 both produce correct, production-ready fixes for the async cache race condition, with Opus offering slightly more pedagogical depth and a second elegant alternative. Haiku 4.5 correctly identifies the bug and provides a sound lock-based solution but is less thorough in explaining the failure modes.

Task

The following Python async function intermittently returns stale data. Identify the bug, explain why it happens, and provide a minimal corrected version.
import asyncio

cache = {}

async def get_or_fetch(key, fetcher):
    if key in cache:
        return cache[key]
    value = await fetcher(key)
    cache[key] = value
    return value
Answer format: three sections (“Bug”, “Why it happens”, “Fix” with code).

Results

Model	Latency (ms)	Input tokens	Output tokens	Cost (USD)	Verdict
Claude Haiku 4.5	5555	121	576	0.003	Correct and concise; covers lock and in-flight alternatives.
Claude Sonnet 4.6	18645	121	859	0.01325	Complete; in-flight Future implementation with table and exception handling.
Claude Opus 4.7	19669	159	870	0.06764	Correct on all counts; explains stale data scenario; two complete fixes.

Analysis

Claude Haiku 4.5 delivers a crisp, correct answer. It identifies the race condition in the check-then-act pattern, explains that await creates a suspension point, and offers two working solutions: a lock-based approach using asyncio.Lock and an in-flight task deduplication strategy. The lock implementation is straightforward; the in-flight alternative is elegant. Haiku’s explanation of why the bug happens is accurate but briefer than the others, omitting specific discussion of stale data caused by out-of-order completion. The response is fast and cheap, meeting the minimal requirements.

Claude Sonnet 4.6 goes deeper. It provides a detailed, step-by-step race sequence showing exactly how two coroutines can both see a cache miss, suspend at await, and duplicate the fetch. Critically, Sonnet explicitly mentions the “stale or inconsistent data” problem if fetcher return values differ between invocations, which directly addresses the prompt’s phrase “intermittently returns stale data.” The fix uses in-flight tracking with an asyncio.Future, wraps it in asyncio.shield(), and includes exception handling (try/except/finally) to propagate errors to all waiters and clean up the pending dict. A comparison table shows the behavior under four scenarios. The implementation is more robust than Haiku’s; it handles errors and uses explicit futures rather than relying on task semantics.

Claude Opus 4.7 similarly diagnoses the bug correctly and emphasizes the stale-data failure mode: if a slower fetch started earlier returns data captured from an older replica, and overwrites a fresher value already cached by a faster fetch, callers receive stale data. The explanation of why it happens is concise and accurate. Opus provides two complete, working fixes: first, a per-key asyncio.Lock with double-checked locking (re-check after acquiring the lock), and second, an in-flight task approach that stores asyncio.create_task() directly in the cache dict. Both are correct. The second approach is presented as “often cleaner” and more idiomatic. Opus’s lock cleanup (_locks.pop(key, None)) and the note on optional lock management show careful implementation thinking.

Winner and why

Claude Sonnet 4.6 is the winner. While Opus is marginally more comprehensive and provides two alternatives, Sonnet’s solution is production-ready, includes explicit exception handling and cleanup, and adds a comparison table that makes the fix’s correctness immediately clear. Sonnet’s in-flight Future approach, wrapped with asyncio.shield() and try/except/finally, is more defensible in edge cases (e.g., exception propagation, cancellation) than Opus’s simpler task-in-cache pattern. The cost difference (0.01325 vs. 0.06764 USD) is meaningful for a benchmark at scale: Sonnet costs less than 20% of Opus while delivering equivalent correctness and superior clarity. Haiku is excellent value (0.003 USD) and entirely correct, but its briefer treatment of stale data and lack of exception handling in the code make it marginally less suitable for production guidance. Sonnet threads the needle: thorough enough to earn confidence, lean enough to be efficient.

Takeaways

Stale-data framing matters. All three models correctly identify the race condition, but Sonnet and Opus explicitly call out that late-completing fetches can overwrite fresher cache entries, which is the root cause of the “intermittently stale data” symptom in the prompt. Haiku mentions deduplication but not the out-of-order completion problem.
Exception handling separates rough from production code. Sonnet’s in-flight Future approach includes try/except/finally to propagate fetcher exceptions to all waiters and clean up the pending dict even on failure. Opus’s lock solution with a simple re-check is correct but doesn’t explicitly address error propagation. Haiku’s alternatives are solid but silent on errors.
In-flight task deduplication is more idiomatic than explicit locking for async Python. Both Sonnet and Opus prefer the pattern of storing a pending Future or Task in a dict and awaiting it from concurrent callers, treating the cache as both a results store and an in-flight task tracker. This avoids a separate lock dict and aligns with Python’s cooperative concurrency model. Haiku’s lock approach is correct but feels more like translating a threaded pattern.
Cost scales with depth, not correctness. Haiku (0.003 USD, 5.5 s latency) and Sonnet (0.01325 USD, 18.6 s latency) are both entirely correct; Opus (0.06764 USD, 19.7 s latency) adds minimal correctness beyond Sonnet but costs 5x more. For standard diagnosis tasks, Sonnet offers the best signal per token.

Benchmark: Python race-condition diagnosis across Claude

Task

Results

Analysis

Winner and why

Takeaways

Further reading

Frequently asked

Task

Results

Analysis

Winner and why

Takeaways

Further reading

Frequently asked

Related

Benchmark: Python async race-condition diagnosis

Benchmark: Python async race condition diagnosis

Benchmark: Python race-condition diagnosis across Claude

Benchmark: Python async race-condition diagnosis