Measuring LLM Systems

Guardrails & regression gates

Lesson 5 of 5

What you'll learn

Validate inputs and outputs with guardrails and allowlists
Refuse unsafe actions before they reach a tool
Block a deploy when a new eval score regresses past a threshold

Evals tell you how good a system is. Guardrails and gates decide what you let through. They are the enforcement layer: a guardrail runs at request time on a single input or output, and a regression gate runs at deploy time on the whole eval suite. Together they're the seatbelt and the crash test — one protects each trip, the other keeps unsafe builds off the road.

Runtime guardrails

A guardrail is a check that can reject. On the input side: length limits, schema validation, prompt-injection patterns. On the output side: refusing to emit secrets, validating that a tool call's arguments are well-formed, and allowlisting which actions an agent may take. Default to deny — an agent should only call tools you explicitly permit.

const ALLOWED_TOOLS = new Set(["search", "read_file"]);
const allowed = (tool) => ALLOWED_TOOLS.has(tool); // delete_file -> false

The cheapest production incident is the one a guardrail refuses before it happens. A deterministic allowlist beats hoping the model never decides to call delete_file.

Regression gates

A regression gate sits in CI. It runs the pinned eval suite against the new build, compares the aggregate score to a baseline, and fails the pipeline if the score dropped past a tolerance. This is what makes evals load-bearing instead of decorative: a prompt tweak that quietly drops accuracy from 0.92 to 0.78 never reaches users, because the gate turns red and blocks the merge.

A gate you can skip is not a gate

Allow a tiny tolerance for sampling noise, but never make the gate advisory. The first time a regression is waved through "just this once," the eval stops protecting anything. Red means blocked.

A regression gate plus an output guardrail

Run it. The gate fails the deploy when new accuracy drops below baseline minus tolerance, and the guardrail refuses a disallowed tool call. Both return a clear pass/block decision.

Loading editor…

Knowledge check

What is the key difference between a runtime guardrail and a regression gate?

Saved on this device. Sign in to sync your progress everywhere.

PreviousAgent observability & tracing