Scaling Primitives
Caching
Lesson 2 of 5
What you'll learn
- Reason about cache-aside, TTL, and LRU eviction as explicit trade-offs
- Name the two hard problems: invalidation and stampede
- Pick a strategy given a read/write ratio and a staleness budget
A cache is a bet: most requests want a small, hot subset of the data, so you keep that subset close and fast. The bet pays in latency and reduced load on the source of truth. The bill comes as staleness (you may serve old data) and memory (the cache is finite, so something gets evicted).
read: cache HIT -> return cached value (fast path)
cache MISS -> read DB, store in cache, return
Cache-aside, TTL, and LRU
Cache-aside (lazy loading) is the default: the application checks the cache, and on a miss reads the source and populates the cache. Only requested data is cached, and the cache can never be the reason a write succeeds or fails. The cost is that every miss pays the full latency, and the first request after expiry is always slow.
TTL bounds staleness without tracking writes: an entry expires after a fixed window, so the data is at most ttl seconds old. It is the cheapest correctness story you can buy — no invalidation logic — and the right default when slightly-stale reads are acceptable. Shorten the TTL to favor freshness, lengthen it to favor hit rate.
LRU eviction decides what to drop when the cache is full: discard the least-recently-used entry, betting that recent access predicts future access. It approximates "keep the working set." The trade-off is that a large scan (one pass over cold data) can evict your hot set — the reason real systems reach for LRU variants like LRU-K or ARC under scan-heavy load.
Invalidation and stampede
There are only two hard things in computer science: cache invalidation and naming things.
Invalidation is hard because the cache and the source of truth are two copies that drift the instant the source changes. TTL sidesteps it by accepting bounded staleness. If you need fresher data, you must actively evict or update on write — and now a missed invalidation means serving wrong data indefinitely, which is worse than slow.
Stampede (a.k.a. cache miss storm / thundering herd) is the failure mode of a popular key expiring: every concurrent reader misses at once and hammers the source in lockstep, often taking it down right when load is highest. The standard defenses are request coalescing (the first miss fetches, the rest wait on that single in-flight load) and early/jittered expiry so keys don't all expire on the same tick.
Coalesce the miss, not just the hit
A high hit rate hides a fragile miss path. Under a stampede, the cache is empty for that key, so your protection has to live on the miss — single-flight the fetch so one request refills the cache and the rest reuse the result.
Run it. The cache holds 2 entries; get() marks a key most-recently-used, so the next set() evicts whichever key was touched least recently.
Why must stampede protection live on the cache miss path rather than the hit path?
Saved on this device. Sign in to sync your progress everywhere.