Measuring LLM Systems
Guardrails & regression gates
Lesson 5 of 5
What you'll learn
- Validate inputs and outputs with guardrails and allowlists
- Refuse unsafe actions before they reach a tool
- Block a deploy when a new eval score regresses past a threshold
Evals tell you how good a system is. Guardrails and gates decide what you let through. They are the enforcement layer: a guardrail runs at request time on a single input or output, and a regression gate runs at deploy time on the whole eval suite. Together they're the seatbelt and the crash test — one protects each trip, the other keeps unsafe builds off the road.
Runtime guardrails
A guardrail is a check that can reject. On the input side: length limits, schema validation, prompt-injection patterns. On the output side: refusing to emit secrets, validating that a tool call's arguments are well-formed, and allowlisting which actions an agent may take. Default to deny — an agent should only call tools you explicitly permit.
const ALLOWED_TOOLS = new Set(["search", "read_file"]);
const allowed = (tool) => ALLOWED_TOOLS.has(tool); // delete_file -> false
The cheapest production incident is the one a guardrail refuses before it happens. A deterministic allowlist beats hoping the model never decides to call delete_file.
Regression gates
A regression gate sits in CI. It runs the pinned eval suite against the new build, compares the aggregate score to a baseline, and fails the pipeline if the score dropped past a tolerance. This is what makes evals load-bearing instead of decorative: a prompt tweak that quietly drops accuracy from 0.92 to 0.78 never reaches users, because the gate turns red and blocks the merge.
A gate you can skip is not a gate
Allow a tiny tolerance for sampling noise, but never make the gate advisory. The first time a regression is waved through "just this once," the eval stops protecting anything. Red means blocked.
Run it. The gate fails the deploy when new accuracy drops below baseline minus tolerance, and the guardrail refuses a disallowed tool call. Both return a clear pass/block decision.
What is the key difference between a runtime guardrail and a regression gate?
Saved on this device. Sign in to sync your progress everywhere.