Scaling Primitives
Queues & backpressure
Lesson 4 of 5
What you'll learn
- Explain async decoupling and at-least-once delivery as a trade
- Show why a bounded queue with backpressure beats an unbounded one
- Quantify the latency cost a queue adds in exchange for throughput
A queue sits between a fast producer and a slower consumer so neither has to run at the other's pace. It buys decoupling (the producer doesn't block on the consumer), burst absorption (a spike fills the queue instead of dropping requests), and independent scaling (add workers without touching producers). The bill: every queued item waits, so you trade latency for throughput and resilience.
producer --enqueue--> [ queue ] --dequeue--> worker pool
fast buffer slow
At-least-once and the decoupling tax
Most durable queues guarantee at-least-once delivery: a message is redelivered if a worker crashes before acknowledging, so nothing is silently lost. The consequence you must design for is duplicates — the same message can be processed twice, so handlers have to be idempotent (processing twice equals processing once). Exactly-once across a network is effectively a myth; the honest framing in an interview is "at-least-once delivery plus idempotent consumers."
The decoupling tax is end-to-end latency and observability. A request that used to be synchronous now returns before the work is done, so the user sees "accepted," not "completed." You inherit retries, dead-letter queues, and the question "is it slow, or did it fail and we're retrying?" Use a queue when the work can be async; don't hide a synchronous dependency behind one.
Bounded queues and backpressure
An unbounded queue looks safe — it never rejects — but it converts an overload into two worse failures. First, latency grows without limit: if arrival rate exceeds service rate, the queue depth climbs forever and items wait minutes, then hours, while the producer happily keeps enqueuing. Second, the queue eventually exhausts memory and the whole process dies — you turned a partial overload into a total outage.
A bounded queue forces the decision early: when full, you must do something — reject (shed load), block the producer (apply backpressure), or drop the oldest. Backpressure is the signal "I'm full, slow down" propagating upstream, so the system degrades gracefully instead of collapsing. By Little's Law, L = λ × W: a bounded L (queue length) caps W (wait time) for a given arrival rate λ. That bound is the point — fast failure beats unbounded latency.
Unbounded queues defer the failure, they don't prevent it
An unbounded queue feels resilient in a demo and detonates in production: it absorbs a sustained overload silently until latency is unusable or memory is gone. Always bound the queue and decide explicitly what happens when it's full.
Run it. The queue caps at 3 items; a fast producer outruns the worker, so enqueues past the cap are rejected — backpressure instead of unbounded growth.
Why is a bounded queue with backpressure preferable to an unbounded queue under sustained overload?
Saved on this device. Sign in to sync your progress everywhere.