Scaling Primitives

Load balancing

Lesson 1 of 5

What you'll learn

Compare round-robin, least-connections, and hashing as distribution strategies
Explain why health checks are what make horizontal scaling real
Know which algorithm an interviewer expects for sticky vs stateless workloads

A load balancer fronts N backends and answers one question per request: which backend? The horizontal-scaling story only works if that choice keeps load even and routes around dead nodes. The algorithm and the health check are the two knobs that matter.

client -> LB -> pick(backends) -> backend
              health-checked pool only

Round-robin vs least-connections vs hashing

Round-robin rotates through backends in order. It is stateless and trivially fair when requests are uniform. The moment request cost varies — one slow query, one fat upload — round-robin piles work onto whichever node drew the expensive requests, because it ignores actual load.

Least-connections picks the backend with the fewest in-flight requests. It self-corrects under skew: a node stuck on slow work stops attracting new requests. The cost is shared state — the LB must track live connection counts — and it can still misjudge when "connection count" is a poor proxy for "actual load."

Hashing maps a key (client IP, session ID, cache key) to a backend deterministically. The same key always lands on the same node, which buys you session stickiness and cache locality. The trade-off is that hashing fights even distribution: a hot key or skewed key space overloads one node, and naive hash % N reshuffles everything when N changes (the next lesson's whole problem).

round-robin     -> simple, fair only if requests are uniform
least-conns     -> adapts to skew, needs live load state
hash(key) % N   -> sticky + cache-friendly, but skews & rehashes

What health checks actually buy you

Adding a node doesn't add capacity until traffic reaches it; a dead node doesn't lose capacity until traffic stops reaching it. Health checks are the mechanism for both. The LB probes each backend (GET /healthz) and only the passing set is eligible for selection. This is the difference between "we have 10 replicas" and "we have 10 replicas serving traffic."

The grading nuance: distinguish liveness (is the process up?) from readiness (can it serve now? — warm caches, open DB pool, not draining). Routing to a live-but-not-ready node during a deploy is a classic self-inflicted outage.

Stickiness is a liability, not a feature

Reach for hashing only when you genuinely need session affinity or cache locality. Sticky routing concentrates a hot user on one node and makes that node a single point of failure — prefer stateless backends so any algorithm is safe.

Two balancers

Run it. Round-robin rotates blindly; least-connections routes away from the busy backend. Watch how the long-running request changes which one stays balanced.

Loading editor…

Knowledge check

Why does adding replicas only increase real capacity once health checks are in place?

Saved on this device. Sign in to sync your progress everywhere.

Next Caching