AIgentic

Agentic Systems & LLM Tooling Daily

Benchmark

Benchmark: Python race-condition diagnosis across Claude

All three Claude models correctly identified the cache race condition and provided working fixes. Sonnet 4.6 balanced clarity, completeness, and cost most effectively, while Opus offered exhaustive detail at 5x the price.


Claude Haiku 4.5, Sonnet 4.6, and Opus 4.7 were tasked with diagnosing a Python async cache race condition, explaining why it occurs, and providing a corrected implementation. All three models identified the bug correctly and offered working fixes, but differed significantly in explanation depth, code clarity, and cost efficiency.

Task

The following Python async function intermittently returns stale data. Identify the bug, explain why it happens, and provide a minimal corrected version.

import asyncio

cache = {}

async def get_or_fetch(key, fetcher):
    if key in cache:
        return cache[key]
    value = await fetcher(key)
    cache[key] = value
    return value

Answer format: three sections (“Bug”, “Why it happens”, “Fix” with code).

Results

ModelLatency (ms)Input tokensOutput tokensCost (USD)Verdict
Claude Haiku 4.54,1771213910.00208Complete, correct, concise
Claude Sonnet 4.613,8741217180.01113Complete, well-structured, optimal balance
Claude Opus 4.712,7721596870.05391Complete, exhaustive, high cost

Analysis

Claude Haiku 4.5 correctly identified the race condition and provided a solid per-key lock solution with double-checked locking. The explanation was direct: multiple coroutines can enter after the cache check but before any write, causing inconsistency. The fix used asyncio.Lock per key and included clear comments explaining why the pattern works. The response was efficient at 391 output tokens and ran in 4.2 seconds, making it the fastest. However, the explanation of the “why” section compressed the event-loop mechanics into four numbered steps without elaborating on the single-threaded yield-point semantics that make this problem unique to async code. A developer unfamiliar with asyncio might not fully grasp why the race condition is possible at all without threads.

Claude Sonnet 4.6 delivered the most complete and well-organized response. It explained the race condition as a “cache stampede” and provided a detailed walkthrough of the interleaving sequence, explicitly noting that asyncio coroutines run on a single thread but yield at every await point. The fix matched Haiku’s approach (per-key locks with double-checked locking) but included a table summarizing the key rationale behind each part of the solution. It also noted the efficiency gain of keeping different-key requests concurrent. The response was 718 tokens, cost 0.0111 USD, and took 13.9 seconds to generate. The extra detail made it more accessible to junior developers while remaining technically precise.

Claude Opus 4.7 provided an alternative and arguably more elegant fix using asyncio.Future stored directly in the cache. Instead of using locks, concurrent callers await the same Future, ensuring only one fetch runs per key and all callers receive identical results. This approach is theoretically superior because the check-and-insert step has no await between them, making it atomic at the event-loop level. However, the explanation was denser and the code introduced exception handling complexity (re-raising after popping the cache) that, while correct, adds cognitive load. Opus was the most expensive at 0.0539 USD (5x Sonnet, 26x Haiku) and offered marginally better algorithm design that most production codebases would not need.

Winner and why

Claude Sonnet 4.6 is the best choice for this task. It achieved the optimal balance of correctness, clarity, and cost. All three models identified the core bug and provided working solutions, but Sonnet’s explanation of the yield-point semantics was clearer than Haiku’s, and its pedagogical structure (explicit numbering, a summary table, efficiency notes) was superior without Opus’s unnecessary complexity. For a developer writing concurrent cache code, Sonnet’s per-key lock approach is more immediately understandable and easier to maintain than Opus’s Future-based trick. At 0.0111 USD, Sonnet cost 5 times less than Opus while delivering measurably better communication. Haiku was the fastest and cheapest, but its terse explanation of event-loop mechanics left gaps in reasoning. Sonnet filled those gaps without overshooting into advanced patterns.

Takeaways

All three models understood the fundamental issue. The race condition stems from a gap between the check and the write at an await suspension point. This is a core async pattern error, and no model struggled with it or proposed thread-unsafe solutions.

Explanation clarity scales with model size, but Sonnet found the sweet spot. Haiku’s brevity was efficient but incomplete for the “why it happens” section. Opus was thorough but introduced algorithmic complexity that obscured the core lesson. Sonnet balanced depth and accessibility, making it the best for teaching.

Cost-per-quality ratio strongly favors Sonnet in practical scenarios. Haiku is acceptable for time-constrained, cost-sensitive deployments (e.g., batch diagnosis), but risks insufficient explanation. Opus’s Future-based approach is intellectually interesting but not worth 5x the cost unless the codebase explicitly requires atomic non-awaited checks. Sonnet’s per-key lock pattern is industry-standard, well-explained, and economical.

Production preference: per-key locks over futures-in-cache. While Opus’s Future approach is theoretically atomic, managing Future lifecycles and exception handling adds operational friction. The lock-based approach (Haiku, Sonnet) is battle-tested in libraries like aioredis and httpx and remains the consensus for concurrent cache patterns in Python async code.

Further reading

Frequently asked

What is the core bug in the original async cache function?

The function has a race condition where multiple concurrent coroutines can bypass the cache check before any of them writes to it, causing duplicate fetches and inconsistent results. The gap between the `if key in cache` check and the `cache[key] = value` assignment allows interleaving at the `await` suspension point.

Why do async functions need different locking strategies than threaded code?

Async functions run on a single thread but yield at `await` points, allowing other coroutines to interleave. Per-key locks (asyncio.Lock) or futures-in-cache approaches ensure only one fetch happens per key, while different keys remain concurrent without thread overhead.

Which fix approach is safest for cache consistency?

Storing an in-flight Future or Task in the cache (Opus approach) is the most robust: it makes the check-and-insert atomic with respect to the event loop. However, per-key locks with double-checked locking (Haiku/Sonnet approach) are simpler and equally effective for most workloads.

What is double-checked locking and why does it matter here?

Double-checked locking re-checks the cache after acquiring the lock, preventing redundant fetches from coroutines that were queued while waiting for the lock. The first check avoids lock overhead on cache hits; the second check inside the lock handles concurrent arrivals.

← All posts