EXECUTIVE

AI agent token spend: the control sequence UK CFOs need in 2026

By Dr Aalok Shukla, CEO · Published 5 May 2026 · Updated 7 May 2026 · 9 min read

Per Gartner (the global research and advisory firm, in its press release dated 25 March 2026), agentic models burn 5 to 30 times more tokens per task than a standard generative AI chatbot. Per Forrester (the research and advisory firm, in its 2026 B2B Marketing, Sales, and Product Predictions released 28 October 2025), B2B companies will lose more than 10 billion US dollars in enterprise value from ungoverned generative AI in 2026. The CFO control sequence that contains both is operational, not architectural.

UK mid-market firms that bring agent spend under control in 2026 do five things in order: connect the data, set per-agent budgets and named approvers, run insight read-only first, give action narrow bands, and report unit economics monthly. The result is fewer surprise bills and a quieter governance story for the board pack.

Agent spend is the new cloud bill, and it grew up faster

The story most AI vendors are telling in 2026 is that token prices are falling. The story finance leaders are actually living is that agent spend is climbing, in some cases sharply. The two facts coexist because volume is rising faster than price is falling.

Per Gartner (the global research and advisory firm, in its press release dated 25 March 2026, titled Gartner Predicts That by 2030, Performing Inference on an LLM With 1 Trillion Parameters Will Cost GenAI Providers Over 90 Per Cent Less Than in 2025), agentic models require 5 to 30 times more tokens per task than a standard generative AI chatbot. The reason is structural: an agent does not answer once and stop. It loops over a task, reads context from connected systems, retrieves documents, plans, acts, observes, and revises. Each cycle is another inference call. A reasoning loop running 10 cycles can consume 50 times the tokens of a single linear pass.

The price-volume crossover was easy to miss because, on a per-token basis, costs are still falling. Gartner forecasts that performing inference on a one-trillion-parameter LLM will cost providers over 90 per cent less by 2030 than in 2025. Per IDC research released in 2026, agent usage and inference demand are climbing in lockstep, with agent volumes expected to step up across 2026 and 2027 as enterprises move pilots into production. The cost per task for a customer service agent is materially lower than a contained human-handled ticket; the cost per task for a code-review agent is materially lower than the equivalent senior engineer time. The unit economics are real.

The problem is that most enterprises do not measure unit economics. They measure top-line invoice. When the top-line invoice triples in a quarter because a new agent fleet went into production, the CFO finds out from the LLM provider's billing screen, not from the operating dashboard. That is the gap AIOS Command is designed to close: a connected operating layer that pairs token spend with the operating outcome on the same dashboard, so an AI fleet either pays back on its own metrics or shrinks before the next bill cycle.

Three failures CFOs see when agent spend goes off-piste

Three failure modes show up consistently when an enterprise loses control of agent spend. They are the operational signature of the Forrester prediction below. Per Forrester (the research and advisory firm, in its 2026 B2B Marketing, Sales, and Product Predictions released 28 October 2025), B2B companies will lose more than 10 billion US dollars in enterprise value in 2026 from declining stock prices, legal settlements, and fines tied to ungoverned generative AI. 19 per cent of buyers using genAI applications report feeling less confident in their purchasing decisions because of inaccurate information. The financial loss and the trust loss travel together.

Recursive loops. An agent enters a reasoning cycle that does not terminate cleanly. It re-reads its own output, expands the task beyond the original scope, or retries failed sub-steps with longer context windows. Token consumption goes parabolic for a single user task. The CFO sees the cost line, not the cause.
Context-window inflation. RAG architectures (the pattern of pulling relevant documents into the prompt before the model answers) inflate context windows by three to five times when the retrieval rule is not tight. Every additional thousand tokens on the way in is an additional charge multiplied by the agent's call volume.
Always-on monitoring agents. A monitoring agent that runs 24/7 looks cheap per call. At 86,400 calls per day across a fleet of monitors, the line-item is anything but cheap. Worse, most always-on monitors do not log the value of their actions in operating terms, so there is no unit economic to measure them against.

Per Deloitte (State of AI in the Enterprise, April 2026, surveying 3,235 IT and business leaders across 24 countries), only 21 per cent of organisations report mature governance for AI agents, even as 74 per cent expect to use AI agents at least moderately by 2027. The operating gap is large and the buying volume is large; the cost story is the predictable consequence.

Per the McKinsey UK practice (in The new productivity paradox, May 2026), AI does not lift productivity in firms that have not first rewired the operating layer underneath it. The same logic applies to agent cost: agents do not stay inside a budget that has not been wired into the operating layer. Budget enforcement at the API gateway is necessary but not sufficient. The budget has to know what the agent is for and whether the agent is delivering.

Bring agent spend under control. Join the AIOS Command waitlist, from £250/mo.

Join the waitlist

Connect and identify growth opportunities across all your systems, then deploy AI operators to multiply your team

Connect and identify growth opportunities across all your systems, then deploy AI operators to multiply your team. That sentence is the entire control sequence in one line. The first half is the prerequisite that every shipped agent needs and most pilots skip: a connected data estate so agents read facts from the operating systems rather than guessing from prompts. The second half is the action layer that turns the connected estate into output: AI operators inside controlled bands, multiplying the team rather than replacing it.

The two-layer model makes the cost mechanics legible. An insight team reads the operating systems continuously: AVA (the insight analyst) reports unit economics by agent (cost per task, value per task, payback by month) so the CFO can see which agents earn their token bill and which ones do not. DEX (the deal-flow analyst) tracks revenue-side outcomes such as pipeline coverage and forecast accuracy when sales agents are involved, so the unit economic includes the revenue side, not just the cost side. An action team handles execution inside narrow bands: KORA (the resolution operator) routes exceptions to named human approvers; KIA (the contracts watcher) enforces vendor and licence boundaries on what an agent is allowed to call; LEXI (the policy operator) keeps the audit trail complete so governance can read every action back to its input data and output system.

The integration layer is the cost control. AIOS Command connects to the systems the agents need (CRM, ERP, billing, banking, customer service, marketing, contracts, observability) so an agent does not need to retrieve, parse, and infer state that already exists in a system of record. The retrieval cost falls; the answer accuracy rises; the loop count drops. The AI agent governance playbook covers the seven controls behind this in detail. The agentic AI failure rate ops checklist covers the failure modes that a connected operating layer prevents, and the data silos research covers the prerequisite that has to land first.

The five-control sequence UK CFOs are running in 2026

The CFOs and CTOs who have brought agent spend under control without slowing the agent programme have run the same five controls in the same order. Skipping any one of them is the most common reason a runaway bill returns the next quarter.

Connect the data first. The cheapest token is the one not spent. A connected data estate (CRM, ERP, billing, banking, customer service, contracts, observability) lets agents read state instead of inferring it. The retrieval bill falls and answer quality rises. AIOS Command's 900+ connector library is the practical shape of this control; the bottleneck is rarely connector availability and almost always the rule library that has to be built before any agent acts.
Set per-agent and per-task budgets with hard ceilings. Per Gartner, agents burn 5 to 30 times more tokens per task than a standard chatbot, so the budget has to be set at the agent and the task, not at the model. Budgets enforce at the gateway, but the unit they enforce is the operating one (a customer service ticket, a finance reconciliation, a sales follow-up), not the model API call.
Run insight read-only before any action posts. Stand AVA up in read-only mode for one full operating cycle. AVA reports unit economics by agent across the cycle (token cost, completion rate, value delivered). The exercise is what surfaces the recursive-loop, context-window inflation, and always-on failure modes before they ship to production.
Give action narrow bands with named approvers. Once the unit economics are stable, give KORA and KIA narrow operating bands: a defined task scope, a defined materiality threshold, a defined named human approver. The agent proposes; the named approver approves; the action posts. This is the band Deloitte's State of AI says fewer than a quarter of organisations have governance for. Getting it right is the moat.
Report unit economics monthly. Cost per task, value per task, payback period, agent-actioned vs human-actioned mix, exceptions per thousand actions, and audit findings. Anchor the board narrative on unit economics, not invoice totals. The CFO conversation moves from "why is the LLM bill up" to "which agents earn their tokens and which ones do not".

What good looks like by quarter four

UK mid-market firms that ran this five-control sequence through 2026 typically report three things by quarter four. First, the agent invoice is predictable: month-on-month variance falls inside the range a CFO is willing to forecast. Second, the agent fleet is smaller and sharper: usually two or three agents have been retired (often the always-on monitors with no measurable operating outcome) and the surviving agents have a stable unit economic. Third, the audit trail is complete: every agent action is traceable back to the input data and the output system, which is what makes the next stage of governance defensible to a board, an auditor, and (per Forrester) the regulators that the prediction implicitly anticipates.

None of this requires a model swap. The operating layer runs on top of the existing model providers; the budgets and approvers stay where they are; the rule library compounds across cycles. UK case studies illustrate the operating model in practice for firms that bought into the connected layer first and the agent fleet second. The CFO board narrative is straightforward: connect the data, budget by agent and task, run insight before action, give action a named approver, and report unit economics. UK mid-market firms that ran this sequence in 2026 are landing the cost story most boards expect to be running by 2027 or later.

Frequently asked questions

Why are AI agent costs growing faster than expected in 2026?

Per Gartner (the global research and advisory firm, in its press release dated 25 March 2026), agentic models require 5 to 30 times more tokens per task than a standard generative AI chatbot. The cost of a single task is the cost of a chatbot session multiplied by the loop count. Gartner forecasts that performing inference on a one-trillion-parameter LLM will cost providers over 90 per cent less by 2030 than in 2025, but the volume rise outpaces the price drop in the near term, which is why CFOs are seeing surprise bills now.

How big is the governance risk on ungoverned generative AI in 2026?

Per Forrester (the research and advisory firm, in its 2026 B2B Marketing, Sales, and Product Predictions released 28 October 2025), B2B companies will lose more than 10 billion US dollars in enterprise value in 2026 from declining stock prices, legal settlements, and fines tied to ungoverned generative AI. Per Deloitte (State of AI in the Enterprise, April 2026, surveying 3,235 leaders across 24 countries), only 21 per cent of organisations report mature governance for AI agents, even as 74 per cent expect to use them at least moderately by 2027.

What does an AI agent control layer actually contain?

Five elements, in order: a connected data estate so agents read facts from the operating systems rather than guessing; per-agent token budgets and per-task budgets with hard ceilings; a named human approver inside every action band; a complete audit trail of every agent action linked back to the input data and the output system; and unit-economics reporting that pairs token spend with the operating outcome (close days, DSO compression, deflection rate). Without all five, the agent is a cost centre with limited line of sight.

Where do UK CFOs typically start when bringing this under control?

Most start with an inventory: which agents exist, who owns them, what they cost, and what they actually deliver. The inventory usually surfaces three or four agents that have grown beyond their original scope and a long tail of small agents that nobody owns. The CFO move is to consolidate onto an operating layer that connects all systems, set per-agent budgets and named approvers, and pair token spend with outcome metrics. AIOS Command from Implement AI is the layer UK mid-market firms typically use to run that consolidation.