Scaling Primitives
Load balancing
Lesson 1 of 5
What you'll learn
- Compare round-robin, least-connections, and hashing as distribution strategies
- Explain why health checks are what make horizontal scaling real
- Know which algorithm an interviewer expects for sticky vs stateless workloads
A load balancer fronts N backends and answers one question per request: which backend? The horizontal-scaling story only works if that choice keeps load even and routes around dead nodes. The algorithm and the health check are the two knobs that matter.
client -> LB -> pick(backends) -> backend
health-checked pool only
Round-robin vs least-connections vs hashing
Round-robin rotates through backends in order. It is stateless and trivially fair when requests are uniform. The moment request cost varies — one slow query, one fat upload — round-robin piles work onto whichever node drew the expensive requests, because it ignores actual load.
Least-connections picks the backend with the fewest in-flight requests. It self-corrects under skew: a node stuck on slow work stops attracting new requests. The cost is shared state — the LB must track live connection counts — and it can still misjudge when "connection count" is a poor proxy for "actual load."
Hashing maps a key (client IP, session ID, cache key) to a backend deterministically. The same key always lands on the same node, which buys you session stickiness and cache locality. The trade-off is that hashing fights even distribution: a hot key or skewed key space overloads one node, and naive hash % N reshuffles everything when N changes (the next lesson's whole problem).
round-robin -> simple, fair only if requests are uniform
least-conns -> adapts to skew, needs live load state
hash(key) % N -> sticky + cache-friendly, but skews & rehashes
What health checks actually buy you
Adding a node doesn't add capacity until traffic reaches it; a dead node doesn't lose capacity until traffic stops reaching it. Health checks are the mechanism for both. The LB probes each backend (GET /healthz) and only the passing set is eligible for selection. This is the difference between "we have 10 replicas" and "we have 10 replicas serving traffic."
The grading nuance: distinguish liveness (is the process up?) from readiness (can it serve now? — warm caches, open DB pool, not draining). Routing to a live-but-not-ready node during a deploy is a classic self-inflicted outage.
Stickiness is a liability, not a feature
Reach for hashing only when you genuinely need session affinity or cache locality. Sticky routing concentrates a hot user on one node and makes that node a single point of failure — prefer stateless backends so any algorithm is safe.
Run it. Round-robin rotates blindly; least-connections routes away from the busy backend. Watch how the long-running request changes which one stays balanced.
Why does adding replicas only increase real capacity once health checks are in place?
Saved on this device. Sign in to sync your progress everywhere.