Agentic AI guardrails: stopping the while(true) from burning your token budget
Four guardrails that turn a vibes-loop into something you'd actually run in production: budget envelope, retry curves, break conditions, and a real fallback chain.
Last time I said agentic AI is mostly a while(true) loop with vibes. That post got a few "what's the fallback?" replies, and the honest answer is: the fallback is the loop body. Without it, you have a thing that consumes money in a circle until you SSH in and kill -9 it.
This post is the technical follow-up. Four guardrails that turn the vibes-loop into something you'd actually run on production traffic.
Guard 1 — Budget envelope before the first call
Most agent failures aren't logical. They're financial. Token cost stops mattering at $0.0001/request and starts mattering when a single user query spawns 47 tool calls because the planner decided "let me also check the weather while I'm here."
The fix is a hard envelope at the entry point. Not a soft "warn at 80%" thing — a hard ceiling that throws.
class BudgetGuard {
private spent = 0;
constructor(private readonly cap: number) {}
charge(inputTokens: number, outputTokens: number) {
this.spent += inputTokens + outputTokens;
if (this.spent > this.cap) {
throw new BudgetExceeded(this.spent, this.cap);
}
}
remaining() {
return Math.max(0, this.cap - this.spent);
}
}
const budget = new BudgetGuard(50_000); // tokens, not dollars — convert at billing timeThen every model call charges before continuing. The first agent I ran in production decided to refactor a service while it was supposed to look up an order. The bill the next morning was a hundred dollars on what should have been a one-cent query. After this guard, the same query throws at hop 8 with BudgetExceeded: 51200 > 50000, returns a polite "I'm out of room" message, and the worst case is a 10-cent mistake.
Guard 2 — Retry curves that recover, not poke
When you Google "exponential backoff" you get a 5-line snippet that doubles a delay between retries. That's not what backoff is for. The point isn't to space out polite retries. The point is to give the downstream service time to recover.
Real curve:
- Jitter (because synchronized retries from 1000 instances kill the same service)
- Cap (because waiting 17 minutes for an API call is worse than failing)
- Circuit breaker (because if 50 in a row failed, the 51st isn't going to succeed)
async function callWithBackoff<T>(
fn: () => Promise<T>,
{ maxAttempts = 5, baseMs = 200, capMs = 8_000 } = {},
): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (err) {
if (!isRetryable(err) || attempt === maxAttempts) throw err;
const delay = Math.min(capMs, baseMs * 2 ** (attempt - 1));
const jittered = delay * (0.5 + Math.random() / 2);
await sleep(jittered);
}
}
throw new Error("unreachable");
}isRetryable matters more than the curve. 429s and 503s are retryable. 400s and 401s aren't — retrying is just making the same mistake louder.
Guard 3 — When to break the loop
The hardest part of while(true) isn't writing the while. It's writing the break. Five conditions belong in the body:
while (true) {
if (budget.remaining() < 1_000) break; // 1. financial
if (Date.now() - started > 30_000) break; // 2. wall clock
if (lowConfidenceStreak >= 3) break; // 3. model said "I'm not sure" three times
if (outputUnchanged >= 2) break; // 4. it's just looping
if (cancellationToken.aborted) break; // 5. user gave up
const step = await agent.step({ budget: budget.remaining() });
budget.charge(step.inputTokens, step.outputTokens);
if (step.isDone) break;
lowConfidenceStreak = step.confidence < 0.4 ? lowConfidenceStreak + 1 : 0;
outputUnchanged = sameAsLast(step.output) ? outputUnchanged + 1 : 0;
}The "output unchanged" check is the one that catches most production loops. An agent that's stuck thinking will often produce the same partial answer 4-5 times in a row, hoping the next sample is the one. It isn't. Break out, give up, return what you have.
The agent I'm thinking of right now retried 47 times because nobody told it to give up. The 48th was the same as the 47th. The retry cost in tokens was higher than the original answer would have been at GPT-4 prices.
Guard 4 — Fallback chain, not fallback message
"I tried my best" is not a fallback. It's surrender. Real fallback is a chain:
async function answer(query: string): Promise<string> {
try {
return await runAgent(query, { model: "claude-opus" });
} catch (err) {
if (err instanceof BudgetExceeded) {
// Cheaper model with a tighter loop
return await runAgent(query, { model: "claude-haiku", maxSteps: 3 });
}
if (err instanceof TimeExceeded) {
// No agent — just direct LLM call with no tools
return await directCompletion(query);
}
// Everything else: rule-based
return ruleBased(query) ?? "I don't have a confident answer.";
}
}The chain matters because the failures are different. Budget overrun is "I need to think cheaper." Time overrun is "I need to think simpler." Logic error is "I need to not think." Each branch should retry on the cause, not the symptom.
Closing
Agentic AI without guardrails is autocomplete in a circle. The hardest part isn't writing the while(true). It's writing the break.
If you take one thing away: put the budget envelope around the loop before anything else. The other three guards are quality. Budget is survival.
Last note: I'm running these patterns on a system that picks tools for a marketplace catalog of 7 million products. It runs through about $40/month in API costs on traffic that would have cost $4,000/month two iterations ago. The difference isn't smarter agents. It's better breaks.
// while you're here
- 9 min read
Agentic AI is mostly while(true) with vibes
Production lessons from running autonomous agents in long-running loops, fallback patterns that actually work, and the day your agent decides to retry 47 times.
agentic-aiproductionengineering-lessons - 9 min read
From $4,000 to $40 a month: the real cost curve of agent guardrails
What stopped the bleeding wasn't a smaller model — it was killing the planner's appetite for extra steps.
agentic-aiproductioncost-engineering