For two years the story about AI costs was reassuring: per-token prices kept falling, so the bill would take care of itself. In 2026 that story fell apart. Per-token prices did keep dropping — but consumption exploded, because agents don't make one call, they make hundreds, in loops, across tools, unattended.
The result is a new line item that lands on the CFO's desk with no warning. Teams have reported running three times over their annual token budgets by spring. One widely-cited incident saw a handful of agents stuck in a recursive loop run up tens of thousands of dollars in a single weekend. The discipline the industry built for cloud spend — FinOps — is now being rewritten for tokens, and most organizations admit they don't yet have the granularity to govern it.
Why agent spend is different
Traditional API cost is roughly proportional to traffic you can see and cap. Agentic spend isn't. Three things make it dangerous:
Loops. An agent that retries, re-plans, or calls itself can multiply token usage without a human ever clicking anything. A bug that would once have thrown an error now quietly bills you by the second.
Fan-out. One user request can spawn many model calls — retrieval, reasoning, tool calls, sub-agents. The cost of a "single" interaction is no longer a single number.
Invisibility. By the time spend shows up on a provider invoice, the money is gone. Monthly billing is a postmortem, not a control.
Why dashboards aren't enough
Most of the tooling that has appeared to address this is observability — dashboards that show you where the money went. That's useful for the retrospective, but it's the wrong altitude for the problem. A dashboard tells you a runaway agent cost you $40,000 last night. It doesn't stop the next one.
The distinction that matters is observe versus enforce. Observability lives next to the data, after the fact. Enforcement has to live in the request path, where it can act on a request before the spend happens. If your cost tooling can't sit between the agent and the model, it can only ever report — it can't intervene.
What real-time cost governance looks like
Because a gateway sits directly in the request path, it can do what a dashboard can't: act in the moment.
Budgets enforced in the hot path. Set a budget per tenant and per agent. A request that would breach it is throttled or blocked as it's made — not flagged on next month's statement.
Token-type attribution. See spend broken down by input, output, and cache, attributed to the individual agent and request. "Which agent is burning the budget?" becomes a one-glance answer instead of a forensic exercise.
Auto-throttling runaway agents. The recursive-loop blowout is the canonical failure. When an agent's consumption spikes past its envelope, throttle it automatically — contain the blast radius before it compounds, the same way a circuit breaker trips before a fire spreads.
Cost is a security problem too
There's a reason cost control belongs next to security rather than in a separate finance tool. An agent burning tokens in a loop and an agent being driven by a prompt-injection attack can look identical from the outside — unusual, escalating, unattended behavior. The same control plane that inspects an agent for abuse is the natural place to enforce its budget, because it already sees every call. Fusing the two means uncontrolled cost and uncontrolled risk get caught by the same system — and, when that system is self-hosted, your spend data never leaves your infrastructure either.
The takeaway
The token economy turned AI cost from a slowly-falling unit price into a volatile, agent-driven risk that can spike overnight. Dashboards that report yesterday's damage aren't governance. Real governance is enforcement in the request path — budgets that bite, attribution you can act on, and automatic throttling before a looping agent empties the account.
TrustGate handles this as one of its four pillars: real-time, actionable cost control that stops wastage as it happens, in your own infrastructure.
