Published 18 April 20269 min read

Agentic AI is mostly while(true) with vibes

Production lessons from running autonomous agents in long-running loops, fallback patterns that actually work, and the day your agent decides to retry 47 times.

agentic-aiproductionengineering-lessons

The first time I shipped an agent loop to production, it worked beautifully for three hours, then quietly racked up a $312 invoice retrying the same tool call forty-seven times because the upstream API had started returning a politely-worded 200 OK that said "service temporarily unavailable" in the body. The model, helpful as ever, decided this was a transient hiccup. Forty-seven times in a row.

You learn a lot about distributed systems from agents. None of it is new — it's all the same lessons we learned from cron jobs in 2008 — but the model adds a special kind of confidence that makes the failure modes louder.

The naive shape

Every agentic-AI tutorial starts the same way. You write a loop:

async function agentLoop(task: string) {
  const messages = [{ role: "user", content: task }];
 
  while (true) {
    const response = await llm.complete({ messages, tools });
 
    if (response.stop_reason === "end_turn") {
      return response.content;
    }
 
    for (const toolCall of response.tool_calls) {
      const result = await runTool(toolCall);
      messages.push({ role: "tool", content: result });
    }
  }
}

This is fine for the demo. The model takes a few turns, decides it's done, returns. You ship it.

In production, this loop is a money cannon pointed at your wallet.

Where it breaks

A short, non-exhaustive list of how I have personally watched this loop misbehave:

The model decided a function call was "almost right" and tried twelve variations of the same arguments before giving up.
A tool returned a 500. The model retried. The tool returned a 500. The model retried with the same arguments. We did this 91 times before our budget alert went off.
The model encountered a tool result it didn't understand, hallucinated a second tool that didn't exist, and then tried to recover by calling the hallucinated tool again.
The user's task was already done by step two, but the model kept calling search tools "just to confirm" until it hit the 100k context limit and crashed.
Two agents in a multi-agent system started ping-ponging tool calls at each other. They were polite about it.

The core problem isn't the loop. It's that the model has no opinion about how long it should run. It will happily keep going until the universe heat-deaths, or until you bill it for the privilege.

What actually helps

Three controls, in order of how much pain they each saved me.

1. Budget — not retry count

Counting tool calls is the wrong unit. Token-spend is the right unit, because that's what hurts. Track it inside the loop and short-circuit:

type Budget = {
  maxInputTokens: number;
  maxOutputTokens: number;
  maxToolCalls: number;
  spentInput: number;
  spentOutput: number;
  toolCalls: number;
};
 
function overBudget(b: Budget): boolean {
  return (
    b.spentInput > b.maxInputTokens ||
    b.spentOutput > b.maxOutputTokens ||
    b.toolCalls > b.maxToolCalls
  );
}
 
async function agentLoop(task: string, budget: Budget) {
  const messages: Message[] = [{ role: "user", content: task }];
 
  while (!overBudget(budget)) {
    const response = await llm.complete({ messages, tools });
    budget.spentInput += response.usage.input_tokens;
    budget.spentOutput += response.usage.output_tokens;
 
    if (response.stop_reason === "end_turn") return response.content;
 
    for (const call of response.tool_calls) {
      budget.toolCalls += 1;
      const result = await runTool(call);
      messages.push({ role: "tool", content: result });
    }
  }
 
  // Don't just throw — return what we have, annotated.
  return {
    status: "budget_exceeded",
    partial: messages.at(-1),
    spent: budget,
  };
}

The cheap insight: when the budget runs out, don't throw. Return a structured partial result. Throwing kills the parent's context. A partial result with status: "budget_exceeded" lets the caller decide what to do — escalate to a human, fall back to a simpler path, or just log it.

2. Tool calls are not idempotent. Make them.

When the model retries a tool, you don't want side effects to multiply. Wrap each tool with an idempotency key derived from its arguments, and cache the result inside the loop:

function toolKey(name: string, args: unknown): string {
  return `${name}:${stableStringify(args)}`;
}
 
const toolCache = new Map<string, ToolResult>();
 
async function runToolMemoized(call: ToolCall) {
  const key = toolKey(call.name, call.arguments);
  const hit = toolCache.get(key);
  if (hit) {
    return { ...hit, _cached: true };
  }
  const result = await runTool(call);
  toolCache.set(key, result);
  return result;
}

When the model calls the same tool with the same arguments twice in one loop, you serve the cached result and tag it. The model usually figures out from the cache hint that it should try something else. Sometimes it doesn't. That's what the budget is for.

3. The fallback is a real product feature

Every agent loop needs a final answer when it runs out of budget, runs out of patience, or hits an unrecoverable tool error. The fallback is not "throw an exception". The fallback is a humble, structured response that says what was attempted, what was learned, and what the agent thinks the next step should be.

type AgentResult =
  | { status: "ok"; answer: string; trace: TraceEntry[] }
  | { status: "partial"; bestGuess: string; missing: string[]; trace: TraceEntry[] }
  | { status: "failed"; reason: string; trace: TraceEntry[] };

If your agent returns { status: "failed", reason: "I tried my best" }, that's not a bug. That's the loop knowing when to break. The hard part of agentic AI is exactly this: knowing when to stop trying. The model will not tell you. You have to.

Trade-offs

You don't need any of this for a five-turn agent that does a single search and summarises. You need every line of it the moment the agent calls more than one tool, runs for more than a minute, or costs more than a cent per invocation.

The pattern scales down to nothing — budget=Infinity is a no-op — and scales up to running for hours. The cost of writing it is small. The cost of not writing it is, in my case, a one-line item on an invoice that said "API usage" and a four-figure number next to it.

The summary your manager wants

Agentic AI is mostly a while(true) loop with vibes. The vibes are the easy part. The loop is also the easy part. The hard part is the budget, the idempotency, and the fallback — three things every backend engineer has built before, for cron jobs, in 2008.

The model adds nothing new to distributed systems. It just makes the failures more expressive.