Scaling Primitives

Caching

Lesson 2 of 5

What you'll learn

Reason about cache-aside, TTL, and LRU eviction as explicit trade-offs
Name the two hard problems: invalidation and stampede
Pick a strategy given a read/write ratio and a staleness budget

A cache is a bet: most requests want a small, hot subset of the data, so you keep that subset close and fast. The bet pays in latency and reduced load on the source of truth. The bill comes as staleness (you may serve old data) and memory (the cache is finite, so something gets evicted).

read: cache HIT  -> return cached value        (fast path)
      cache MISS -> read DB, store in cache, return

Cache-aside, TTL, and LRU

Cache-aside (lazy loading) is the default: the application checks the cache, and on a miss reads the source and populates the cache. Only requested data is cached, and the cache can never be the reason a write succeeds or fails. The cost is that every miss pays the full latency, and the first request after expiry is always slow.

TTL bounds staleness without tracking writes: an entry expires after a fixed window, so the data is at most ttl seconds old. It is the cheapest correctness story you can buy — no invalidation logic — and the right default when slightly-stale reads are acceptable. Shorten the TTL to favor freshness, lengthen it to favor hit rate.

LRU eviction decides what to drop when the cache is full: discard the least-recently-used entry, betting that recent access predicts future access. It approximates "keep the working set." The trade-off is that a large scan (one pass over cold data) can evict your hot set — the reason real systems reach for LRU variants like LRU-K or ARC under scan-heavy load.

Invalidation and stampede

There are only two hard things in computer science: cache invalidation and naming things.

Invalidation is hard because the cache and the source of truth are two copies that drift the instant the source changes. TTL sidesteps it by accepting bounded staleness. If you need fresher data, you must actively evict or update on write — and now a missed invalidation means serving wrong data indefinitely, which is worse than slow.

Stampede (a.k.a. cache miss storm / thundering herd) is the failure mode of a popular key expiring: every concurrent reader misses at once and hammers the source in lockstep, often taking it down right when load is highest. The standard defenses are request coalescing (the first miss fetches, the rest wait on that single in-flight load) and early/jittered expiry so keys don't all expire on the same tick.

Coalesce the miss, not just the hit

A high hit rate hides a fragile miss path. Under a stampede, the cache is empty for that key, so your protection has to live on the miss — single-flight the fetch so one request refills the cache and the rest reuse the result.

LRU cache with eviction

Run it. The cache holds 2 entries; get() marks a key most-recently-used, so the next set() evicts whichever key was touched least recently.

Loading editor…

Knowledge check

Why must stampede protection live on the cache miss path rather than the hit path?

Saved on this device. Sign in to sync your progress everywhere.

PreviousLoad balancing Next Consistent hashing