·45 min read

The week the agent layer closed

Anthropic and Google both re-priced the agent layer in the same week of May 2026, and the capacity argument behind both moves is genuine. What concerns me is the part underneath: when a company builds its processes on agentic systems, every internal step of every workflow gets metered through the vendor's surface, and the customer has no way to see how many of those steps there are or what they actually cost.

Unless you adopt their entire eco-system where they decide the rules of engagement, the tools, how much you get charged and how you would use the technology. This is classic vendor lock-in strategy and it sets the ground for the years to come.

Every single business of any size should think through thoroughly, because it is going to affect their agility, their business strategies, their risk, their independence and freedom.

agentic-eraessays
The week the agent layer closed

In this article

Anthropic's pricing announcement landed on a Tuesday and coincidentally Google's arrived three days later with a similar stand. By the end of the week it was clear that these were not two independent product decisions. Rather, the timing and the specific shape of what each company chose to leave open and what it chose to close, pointed at the same strategic move executed through different levers.

On 13 and 14 May 2026, Anthropic announced that effective 15 June the Claude Agent SDK, the non-interactive claude -p CLI invocation, the Claude Code GitHub Actions integration, and all third-party agents authenticating through Claude subscriptions would move out of the standard subscription rate-limit pool and onto a separate dollar-denominated credit pool: twenty dollars for Pro, one hundred dollars for Max 5x, two hundred dollars for Max 20x, metered at full API list prices, with credits that do not roll over.1Anthropic, "Claude Code: Agentic Usage and Billing Changes," 13 May 2026. The credit pool applies to the Claude Agent SDK, claude -p non-interactive CLI, Claude Code GitHub Actions integration, and all third-party agents authenticating through Claude subscriptions. Credits are metered at full API list prices and do not roll over between billing cycles. Effective 15 June 2026. Interactive Claude.ai, interactive Claude Code in the terminal, and the Cowork desktop product remain unchanged on the subscription pool.

A few days later, Google published its own announcement: Gemini CLI and the Gemini Code Assist IDE extensions would stop serving Google AI Pro, Ultra, and free Code Assist users on 18 June, replaced by the new Antigravity CLI. Gemini CLI would remain accessible only through paid API keys or the Gemini Enterprise Agent Platform.2Google, "Introducing Antigravity CLI," 19 May 2026. Effective 18 June 2026. Gemini Code Assist Standard and Enterprise licence holders, and Gemini Code Assist for GitHub customers through Google Cloud, retain unchanged access. Consumer-tier users are migrated to the Antigravity CLI, which is built on the Antigravity Shared Agent Engine, a server-side multi-agent runtime.

Do you think it is coincidental that the announcement is done in the same timespan and that the change will apply on both in the same week in June?

The official rationale offered for both moves is sensible and partly correct:

Anthropic's rationale, articulated by Boris Cherny in interviews and on social media, is that third-party tools operating outside the prompt cache system are difficult to support sustainably, that subscription pricing was never designed for continuous automated demand, and that some users were running token consumption worth hundreds or thousands of dollars on twenty-dollar plans through tools like OpenClaw and similar third-party harnesses.3Boris Cherny, Anthropic product lead for Claude Code, in interviews and social media posts accompanying the announcement. The specific example cited was users proxying agentic workloads through twenty-dollar Pro subscriptions via third-party harnesses like OpenClaw, generating token consumption worth hundreds or thousands of dollars at API list prices. The prompt cache system that makes interactive sessions economically viable does not cover headless agentic sessions orchestrated by external tools.

Google's rationale skirts the pricing question entirely, presenting the change as a product evolution: workflows had outgrown a single-agent terminal CLI and now required multi-agent orchestration with a unified backend.

Both rationales are factually accurate as far as they go. And are also doing strategic work that the public communications are careful not to name. What I want to walk through in this piece is what these two announcements actually mean when you read them together, why the popular counterarguments to the lock-in concern do not hold up, and what the consequences look like for the different kinds of organisations and builders who now have to decide what to do about it.

What I would urge anyone reading this, particularly those making infrastructure and process decisions for organisations rather than individual choices for themselves, is to recognise that the period of artificial market fluidity that characterised the first eighteen months of agentic AI has ended. The choices are still real, the open layers still exist, and the open-source models are still available. What has gone away is the subsidy that was making certain choices artificially cheap, and the comfortable assumption that one could defer the strategic question of vendor commitment indefinitely while the market sorted itself out. The market has sorted itself out. The vendors have chosen. And the evidence below should help clarify what that means for your specific situation.


What both vendors are actually doing

1 Re-pricing at marginal cost Headless agentic usage moves from flat-rate subsidy to API list prices 2 First-party capture One subsidised path left open: the vendor's own harness, collecting telemetry 3 Third-party suffocation Orchestration layer loses its economic thesis overnight 4 Customer segmentation Monetisation curve rebuilt: cheap interactive, metered programmatic, enterprise API

1The first move, common to both companies, is a re-pricing of the heaviest workload at something closer to its true marginal cost. Headless agentic usage is by a substantial margin the largest per-user token consumer in any AI vendor's product mix. It bypasses the prompt cache optimisations that make interactive sessions economic to serve, and pricing it at the same flat rate as a chat-grade interactive workload was a transitional subsidy that could not survive sustained agentic demand. Re-pricing it at API list rates, or moving it off the interactive surface entirely, is the correct economic response in isolation, and it would be cynical to pretend the capacity argument has no weight given the public data about agentic demand outpacing GPU supply.

2The second move, less openly discussed, is the deliberate capture of agentic workflow inside first-party surfaces. Both companies are leaving exactly one path wide open and closing every other one. For Anthropic, that path is interactive Claude Code running in the terminal, which stays on the unchanged subscription pool. For Google, that path is Antigravity 2.0 and its CLI, a unified, server-side, multi-agent harness designed around the Antigravity Shared Agent Engine. The economic effect is that any developer who wants high-volume agentic work without paying API rates has one rational choice: live inside the vendor's own harness.

The reason this matters more than the immediate inference revenue is that every accepted suggestion, every rejected diff, every plan-mode exploration, and every tool call inside the first-party harness becomes a behavioural signal that feeds the next model generation's reinforcement learning loop, and that signal is worth more to long-term competitive position than the inference revenue it displaces.4The telemetry argument: every interaction inside the first-party harness, including accepted and rejected code suggestions, plan-mode explorations, tool call patterns, and sub-agent configurations, generates behavioural data that feeds the vendor's reinforcement learning pipeline for the next model generation. This data is not available to the vendor from third-party harness usage, even when the underlying model is the same. The competitive value of this signal accumulates across model generations and compounds over time.

3The third move, and the one I find genuinely existential for the broader ecosystem, is the pre-emptive suffocation of the third-party harness category before it becomes the platform layer. Through 2025 and into early 2026 a credible thesis was building that the durable value in the AI stack would migrate up from the foundation model to the orchestration and harness layer, embodied by projects like OpenClaw, Goose, Conductor, Zed's agent mode, Jean, and a long tail of open-source frameworks.5The third-party harness and orchestration category includes OpenClaw, Goose (Block), Conductor, Zed's agent mode, Jean, and numerous open-source frameworks. The shared thesis: if the orchestration layer abstracts model choice, the foundation model commoditises, and the harness owner captures customer relationships, integrations, and workflow lock-in. This thesis was credible through early 2026 and had attracted significant venture investment.

The logic was straightforward: a harness that abstracts model choice commoditises the foundation model underneath, and the harness owner accumulates the customer relationships, integrations, and workflow lock-in that the model vendor cannot easily disintermediate. By pulling the subscription subsidy in the same week, both vendors make the first-party harness the only economically rational choice for any serious volume of agentic work. The third-party harness market freezes in place before it can build distribution moats.

4The fourth move is a customer segmentation that the previous pricing structure was actively preventing. Under the old regime, a single twenty-dollar Pro subscription could absorb the token consumption of what was effectively a small engineering team's worth of automated work, keeping the monetisation curve flat at exactly the segment that most needed differentiation. The new structure rebuilds that curve: consumer interactive usage stays cheap and capped, programmatic usage moves onto a metered credit that exposes true cost to the user, and the upper tier of headless workloads is reserved for enterprise API contracts or paid API key consumption. Google's segmentation is even more explicit. Gemini Code Assist Standard and Enterprise licence holders, and Code Assist for GitHub customers through Google Cloud, retain unchanged access, while consumer-tier users are pushed off the headless surface entirely.6Google's segmentation is explicit in the announcement: Gemini Code Assist Standard and Enterprise licence holders retain access to Gemini CLI. Consumer-tier users (Google AI Pro, Ultra, free Code Assist) are migrated to Antigravity CLI. The effect is a clean separation between enterprise customers on negotiated contracts and consumer customers on list pricing, with the consumer tier bearing the full cost of the transition.

Taken together, the four moves mark the end of the flat-rate subscription era for agentic AI and the beginning of a deliberate differentiation strategy. Each vendor is now constructing a first-party agent surface optimised for its own model, accumulating telemetry that only its own model benefits from, and training its users into workflows that do not transfer cleanly across vendor boundaries. The era of "the agent is just a wrapper over any model" is ending, and the era of "the agent is the vendor's product and the model is its engine" is beginning.

The era of "the agent is just a wrapper over any model" is ending, and the era of "the agent is the vendor's product and the model is its engine" is beginning.

The pricing wedge

The framing of this shift as vendor lock-in deserves to be made precise, because the technical conversation tends to treat lock-in as a binary state rather than as an economic gradient with a specific mechanism. The mechanism here is price discrimination, and the magnitudes are stark enough that the word lock-in is justified despite the absence of any technical incompatibility between vendors.

A developer running high-volume agentic work inside a Claude Code Max 20x subscription pays two hundred dollars a month for what generates effective token consumption in the range of several thousand dollars at API list prices. A developer running the same workload through a third-party orchestrator paying API rates, even at the relatively favourable blended rate of around one and a half dollars per million tokens that some routers offer, can easily accumulate seven hundred dollars per day on a serious autonomous coding workload, which annualises to a quarter of a million dollars per developer.7Estimate based on publicly available API pricing and reported workload patterns. Max 20x subscription: $200/month, $2,400/year. API path at blended rate of approximately $1.50 per million tokens: sustained high-volume agentic coding workloads have been reported to consume token volumes that translate to approximately $700/day, annualising to roughly $250,000 per developer. The 50–100× differential is the range across different workload intensities and blended rates.

THE PRICING WEDGE Same developer, same workload, two paths First-party harness (Max 20x subscription) $2,400/yr $4,800 ??? Subsidies don't last. The gap will close. Third-party orchestrator (API list rates) ~$250,000/yr ~100× Blended API rate ~$1.50/M tokens. Sustained high-volume agentic workload.

The differential between the first-party subscription path and the third-party orchestrator path is a fifty to one hundred multiple, large enough that no third-party orchestrator can absorb it regardless of how superior its product is on quality, ergonomics, or integration breadth. The lock-in is enforced by a deliberate pricing wedge that the vendors set and adjust at will, with switching cost and technical incompatibility playing supporting roles to the price differential rather than carrying the weight themselves.

There is an important secondary point that follows from this. The lock-in is conditional on the vendors maintaining the current subsidy on their interactive surfaces, and the subsidy is rationally vulnerable to being raised once the market has been sufficiently captured. The current pricing reflects an aggressive market-share posture rather than a stable equilibrium, and any business strategy that assumes today's prices will hold next year is making an assumption about vendor strategic restraint that the history of comparable platform moves does not support. The two hundred dollar Max 20x of today becomes the four hundred dollar tier of next year, and the businesses and teams that depend on the subsidy continuing at its current level are taking a position on vendor pricing rather than on technology choice.


Why MCP is not the answer to this

The Model Context Protocol, which Anthropic published as an open standard and Google has adopted across the Antigravity stack alongside OpenAI's adoption in their Agents SDK, looks at first glance like the counterargument to the lock-in framing.8The Model Context Protocol (MCP) was published by Anthropic as an open specification in late 2024. Google adopted MCP across the Antigravity stack and announced managed MCP servers through Apigee at Cloud Next 26. OpenAI integrated MCP support into the Agents SDK. The protocol is now the de facto standard for tool integration across all three major vendors. If the integration layer between models and tools is genuinely open and cross-vendor, then surely the lock-in concern is overstated, and the third-party tool ecosystem can survive by sitting on top of MCP and serving whichever harness the customer happens to be using. This is the most common rejoinder I hear when raising the lock-in question, and I think it is wrong in a specific way that is worth walking through.

THE PROTOCOL IS OPEN. THE GATE IS CLOSED. VENDOR COMMERCIAL ENVELOPE Pricing, billing, telemetry, terms of service First-party harness Claude Code / Antigravity — controls invocation, routing, preference THE GATE Foundation model Decides which tools to invoke — trained to prefer first-party MCP — Model Context Protocol Open, portable, cross-vendor — but passive, invoked at model's discretion Stripe MCP Internal tools 3rd-party MCP MCP sits below the gate. The vendor controls everything above it.

MCP is open at the protocol level and strategically inert at the economic level, and the two facts coexist without contradiction because they operate at different layers of the stack. MCP describes how tools are invoked by a model and how they communicate context back, and it does this in a way that is genuinely portable across vendors. What MCP does not describe, and what it cannot address through any technical means, is who decides which tool gets invoked and under what economic conditions. That decision sits with the model running inside the harness, the harness sits inside the vendor's commercial envelope, and the vendor sets the pricing wedge between using its harness and using a competitor's. The openness of MCP is genuine, but it operates above the gate, not at the gate.

There is a further problem specific to MCP that shapes what businesses can be built on it. MCP servers are invoked by the model at the model's discretion. They cannot communicate at will, they cannot solicit user attention, they cannot initiate interactions, and they have no agency in the conversation beyond responding when the model decides to call them.9MCP servers are invoked by the model at the model's discretion and cannot initiate interactions, solicit user attention, or communicate at will. Server-initiated notifications and sampling (requesting completions back) exist in the protocol specification but are housekeeping channels rather than commercial primitives. The user pays one price for the inference session; the MCP provider is invisible inside that price. The marginal cost of building an MCP server with AI assistance has dropped to roughly half a day of engineering work, while marginal revenue per invocation remains zero. This asymmetry makes standalone MCP monetisation structurally unviable. The user perceives a single price for the inference session, the MCP provider is invisible inside that price, and the provider therefore cannot extract rent at the call site.

The viable monetisation paths reduce to two categories: wrappers around services that are already independently monetised, like the official Stripe, Salesforce, or GitHub Enterprise MCP servers, where the protocol is an access path to value the customer already pays for; and internal corporate integrations that organisations build for themselves and never sell. The category that does not work is the standalone MCP-as-a-service business, because the marginal cost of producing an MCP server using AI assistance has collapsed to roughly half a day of work while the marginal revenue per call remains structurally zero.

Layered on top of the monetisation difficulty is the vendor's ability to throttle, bias, or replace any MCP that becomes commercially interesting. The vendor controls invocation through the model. The vendor can fine-tune the model to prefer first-party tools over functionally equivalent third-party MCPs. The vendor can build paid first-party features that occupy the same functional niche as a popular free MCP and route the value through its own billing rather than through the ecosystem.10The pattern of first-party integration displacing third-party MCP providers is visible across several categories: spreadsheet interaction, web browsing, calendar management, code review, and document handling. In each case, the vendor ships a built-in integration that occupies the same functional niche as a popular community MCP server, routing the value through its own billing and feature set. The effect is systematic enclosure of the integration surface, with the boundary between first-party and third-party territory set by the vendor's roadmap. The pattern is already visible across spreadsheets, browsers, calendars, code review, and document handling, with first-party integrations being shipped at exactly the points where the MCP ecosystem might otherwise have monetised.

MCP is open at the protocol level and strategically inert at the economic level, and the two facts coexist without contradiction because they operate at different layers of the stack.

This is the smoke-in-the-mirror dynamic. MCP is genuinely useful infrastructure and the protocol does meaningful portability work at the integration layer, particularly for internal corporate use cases that remain its dominant application. What it does not do, and what it is sometimes implicitly claimed to do, is undermine vendor lock-in or threaten the strategic position of the harness layer that sits above it. The protocol is open and the gate above it is closed, and the openness of the one does not affect the closure of the other.


The picture that emerges when you assemble these pieces is that the agent layer of the AI stack closed materially in a single week of May 2026, and the closure has consequences that differ sharply depending on where you sit. The implications for a two-person startup building an agent product are not the same as the implications for a mid-size company that has spent the last year weaving AI into its internal workflows, and neither of those is the same as what a large enterprise with procurement leverage and infrastructure budget faces. What follows is a walk through each of these positions, because I think the differences matter at least as much as the general trend.

SegmentExposureOptionsLock-in risk
StartupsExistential — product economics broken by the pricing wedgePivot to open-source models, own a deep domain, or negotiate enterprise API termsSurvival risk highest, but switching cost lowest
Mid-size companiesSevere — metered billing inverts automation ROI, no leverage to negotiateAccept vendor terms, constrain automation to the subscription pool, absorb routing complexityHighest — already invested, cannot negotiate, cannot self-host
Large enterprisesSignificant — long-cycle commitment, but with negotiating powerVolume pricing agreements, private inference, multi-vendor as negotiating positionModerate — options exist but all require capital
Developer toolsHigh — products assumed cheap programmatic accessProve per-call ROI or become features of the vendor's harnessHigh — economic viability depends on vendor pricing decisions
Open-source pathOpportunity — becomes the marginal substitute for routine workHost locally or route through abstraction layers at API ratesLow for routine tasks, does not cover frontier reasoning

Product builders and startups

Through 2024 and 2025, venture capital flowed into the agent and orchestration layer on the thesis that the harness would become the durable value capture point in the AI stack. Startups building coding assistants, workflow automation tools, AI-powered CI/CD pipelines, and multi-agent frameworks were all operating on the same structural assumption: that cheap subscription tokens would remain available for programmatic use, and that the startup could differentiate on orchestration quality, domain expertise, or developer experience while the foundation model underneath became increasingly commoditised.11The agent and orchestration startup wave of 2024–2025 was built on the assumption that foundation model access would remain cheap enough to embed into third-party products at scale. Funding rounds across the category reflected this thesis: orchestration-layer companies, multi-agent framework builders, and AI-native developer tools raised on valuations that assumed sustainable unit economics at subscription-equivalent rates. The pricing changes of May 2026 invalidate the economic premise without changing the product quality.

That assumption died in a week. The pricing wedge means that any product built on top of a vendor's model through the API path now faces a cost structure that the subscription-path competitor, the vendor's own first-party harness, will always undercut by a factor of fifty to a hundred. The startup cannot match the price. It cannot absorb the differential. And it cannot pass the cost to its customers without making its product dramatically more expensive than the vendor's own offering for what appears, from the buyer's seat, to be similar functionality.

The survivors, as I see it, will fall into three categories.

  • Negotiated API economics. Companies that had already built enterprise API relationships with direct pricing agreements, where the economics are negotiated rather than list-rate.
  • Open-source and local inference. Companies that pivot to open-source and locally-hosted models, accepting the quality trade-off on the hardest tasks in exchange for economic independence from the vendor's pricing lever.
  • Deep domain ownership. Companies that own a domain narrow enough and deep enough that the vendor is unlikely to enter it directly, where the product's value lives in the domain expertise rather than in the orchestration, and where the foundation model is a component rather than the product.

Everyone else is holding a product whose unit economics just changed by two orders of magnitude, and the change is not temporary.

But the cost problem is only the first layer. Step back and ask what the startup is actually offering its customers. The product is a service that takes an input, runs it through a sequence of agentic steps, and delivers an output. The customer pays a fee per transaction or a monthly subscription, and the startup's margin is the difference between that fee and the cost of the inference chain underneath. Under the old subsidy, those chains were cheap enough that the margin arithmetic worked. Under API list rates, it does not, and it gets worse as complexity grows.

Take a concrete example. A startup sells an automated customer support product. A single inbound email triggers an agent that reads the message, retrieves the customer's history, classifies the intent, drafts a response, checks it against policy, and sends it. That is six or seven inference calls at minimum, and each one runs at API list price. If the blended cost per call is even a few cents, and the customer is paying five or ten dollars a month for the service, the startup is losing money on every active user. The product that looked like a high-margin SaaS business under the subsidy turns out to be a premium service with shrinking or negative margins at scale.12The margin compression is structural rather than operational. Each step in an agentic chain incurs an inference cost that the startup cannot reduce without degrading the product. The number of steps is determined by the complexity of the task, not by the startup's engineering. As tasks grow more complex, the cost per transaction increases faster than the price the customer is willing to pay, creating a negative margin curve at exactly the point where the product should be scaling.

COMPOUNDING ERROR ACROSS AN AGENTIC CHAIN Each step at 95% accuracy — end-to-end reliability drops fast 100% 80% 60% Accuracy 1 step 2 3 4 5 6 7 95% 90% 86% 81% 77% 74% 70% 7 steps at 95% each = 0.95⁷ ≈ 70% end-to-end. One in three requests fails.

And that is before you account for the quality problem, which compounds alongside the cost. I dissected this in detail in When autonomous agents are the wrong answer, but the short version is that each step in an agentic chain is probabilistic. The model might misclassify the intent, retrieve the wrong context, draft a response that misreads the tone, or fail a policy check that a human would have passed. The error rate at each step may be low, but errors compound across a chain: a sequence of seven steps each operating at ninety-five percent accuracy delivers a correct end-to-end result only seventy percent of the time. The startup is selling a product that costs more per transaction than the customer expects to pay, and that produces the wrong output roughly a third of the time on non-trivial tasks, and the customer has no visibility into how many steps ran, what they cost, or where the errors were introduced, because the chain is opaque by design.

This is what the startup is actually building: a premium-priced, probabilistic workflow where the margins decrease as usage grows and the quality degrades as complexity increases, all running on infrastructure whose pricing is set unilaterally by a vendor who also competes with the startup's product directly.

The structural problem for startups: The vendor's first-party harness will always be subsidised below the cost that any third-party product built on API rates can match. This is not a gap that better engineering or a superior user experience can close, because the gap is in the pricing, not in the product. And the product itself, a probabilistic chain of opaque inference steps with compounding error rates and no customer-visible cost breakdown, is a harder sell than the demo ever suggested.

Mid-size companies and the metering trap

The segment I find most exposed by this shift is the mid-size company, roughly one hundred to a few thousand employees, that spent the past two years weaving AI into its operational fabric: customer service workflows, procurement approvals, compliance checks, HR onboarding, contract review, internal policy enforcement, and the dozens of cross-departmental processes that keep a company running day to day.

These companies face a specific problem that the startup does not share and that the large enterprise can absorb. Under the previous pricing, AI tooling was a per-seat subscription cost: predictable, budgetable, and indifferent to how intensively any department used the tool. A company running agentic workflows across operations, finance, and customer support paid a known monthly line item per seat, and the more those teams automated through the tool, the better the return on that line item looked. Automation made the subscription more valuable, not more expensive.

Under the new pricing, the moment any of those workflows crosses into the programmatic surface, into automated approvals, into policy-checking agents, into any of the headless patterns that the most operationally ambitious companies have built over the past year, the cost model inverts.12Under subscription pricing, the cost of AI tooling was a fixed per-seat expense indifferent to usage intensity. Under metered credit pricing, the cost scales with the volume of programmatic work. For companies that had moved significant operational workflow into automated pipelines, the transition from per-seat to per-token billing can multiply the effective cost by an order of magnitude or more, depending on automation intensity. What was a fixed per-seat cost becomes a usage-based cost that scales with the volume of automated work. The more a company automates, the more it pays. The return-on-investment calculation for automation, which was straightforward under the subscription model, now contains a variable cost component that grows with the automation's own success.

WHAT ONE EMAIL ACTUALLY COSTS PROJECTION — ILLUSTRATIVE ESTIMATE AT API LIST RATES Each box is a metered inference step invisible to the customer Customer email arrives AGENT 1 — INTAKE & CONTEXT ASSEMBLY Read & parse email ~$0.06 Load customer profile ~$0.40 Load constraints ~$0.22 Retrieve history ~$0.70 Classify & route ~$0.14 AGENT 2 — REASONING & DRAFTING Retrieve policies ~$0.42 Cross-ref account ~$0.28 Draft response ~$0.70 Validate rules ~$0.22 Check completeness ~$0.15 AGENT 3 — REVIEW & DELIVERY Tone & brand ~$0.15 Compliance review ~$0.48 Format & send ~$0.08 Total: ~$4 per email Customer sees: 1 reply At 200 emails/day: ~$800/day — ~$24,000/month — for one workflow

To see why this matters beyond the billing page, consider a workflow that most companies will recognise. An email arrives from a customer. An agent reads it, parses the intent, searches for related context across the company's systems, evaluates what the customer needs, and prepares a handoff for the next agent in the chain. That second agent checks the account history, reviews the relevant policy, drafts a response, and passes it to a third agent for tone and compliance review before a visible action is taken, perhaps a reply, perhaps a support ticket, perhaps an escalation to a person. Between the email arriving and the response leaving, a dozen or more internal steps ran, each consuming tokens, each metered, and each probabilistic rather than deterministic in a process the business treats as reliable (I wrote about this tension in The friction between probabilistic and deterministic). The customer saw one interaction. The company, depending on the vendor's billing granularity, may or may not see the full chain underneath. In a subscription world, the cost of that chain was folded into a flat monthly fee and nobody thought about it. In a metered world, the cost scales with the number of internal loops the vendor's agent architecture runs, and that architecture is something the customer did not design, cannot inspect, and has no way to simplify.

This is where the shift stops being a pricing adjustment and becomes a business model problem. These companies budgeted for SaaS, a known cost per seat per month, and they are now exposed to a cost curve that scales with the complexity of processes they cannot see into. The CFO who approved the per-seat subscription understood the number. The metered credit that replaces it for programmatic workloads introduces a variable that has to be modelled, monitored, and capped, and the institutional capacity to do that does not exist in most mid-size companies because they have never needed it before.

The lock-in compounds the problem. A mid-size company lacks the scale to negotiate enterprise pricing agreements directly with Anthropic or Google. It lacks the infrastructure budget and talent to build private inference capability. And it has already accumulated switching costs in the form of agent configurations wired into procurement workflows, compliance review chains, customer service routing, onboarding sequences, and institutional knowledge built around a specific vendor's harness, all of which would need to be rebuilt if it moved. The practical result is that these companies are captive to the vendor's pricing decisions in a way that neither startups, who can pivot to open-source models before the switching costs accumulate, nor large enterprises, who have procurement leverage, are captive.

Per-seat → per-token
The shift from subscription to metered credit for programmatic workloads inverts the ROI of automation: the more a company automates, the more it pays

Large enterprises and the hardware question

Large enterprises sit in a structurally different position, and the difference is worth being precise about rather than treating all companies as uniformly affected. They have options. But those options touch every part of the organisation, not just the technology department, and the decisions required are as much about governance, accountability, and operational risk as they are about infrastructure.

A company with tens of thousands of employees across multiple offices, departments, and geographies has by now woven AI into processes that span the entire business: legal review, contract management, procurement approvals, HR policy enforcement, regulatory compliance, customer operations, financial reporting, and internal communications. Engineering is one department among many. The exposure to this pricing shift reaches well beyond the IT budget. It raises a question about how the company runs, who is accountable when an agentic workflow makes a consequential decision inside a business process, and what happens when the cost of those workflows changes by an order of magnitude on thirty days' notice.

The first option is negotiated volume pricing. A large enterprise with an established cloud procurement function can negotiate directly with Anthropic or Google at rates that sit between the subscription subsidy and the list API price. The enterprise commits to volume, the vendor discounts the rate, and both sides get a number they can budget around. The enterprise does not face the same fifty-to-one-hundred-times pricing wedge that a smaller customer on list rates does, because the enterprise has something the vendor wants: a large, predictable commitment that justifies discounting. But a negotiated rate does not resolve the governance problem. The enterprise still cannot see inside the agent chains running across its departments, still cannot audit how many inference steps a compliance review or a contract analysis actually consumed, and still depends on the vendor's architecture decisions for processes that carry real legal and financial accountability.13Enterprise volume pricing agreements for AI inference follow the same pattern as cloud infrastructure procurement: committed spend in exchange for discounted rates. These agreements reduce the pricing wedge but do not eliminate the structural dependency on the vendor's architecture, telemetry practices, or billing granularity. The enterprise negotiates the rate but not the metering methodology, which means the vendor retains control over how consumption is measured and reported.

The second option, and the one I hear discussed more frequently since the announcements, is private inference. The open-source model landscape has matured to the point where running DeepSeek V3, Gemma 4, Qwen 3, or Llama on dedicated hardware is viable for a meaningful share of operational workloads, particularly for routine, high-volume tasks where the quality gap between open models and frontier models is narrow enough to be acceptable.14As of May 2026, the open-source model landscape includes DeepSeek V3, Gemma 4 (Google), Qwen 3 (Alibaba), and Meta's Llama family. On routine structured tasks, the quality gap between these models and frontier models has narrowed substantially. On long-horizon reasoning, complex multi-step decision chains, and tasks requiring nuanced judgement, frontier models retain meaningful advantages. The crossover point varies by task type and acceptable error rate. The economics of this path are real but not trivial. A GPU cluster capable of serving an enterprise's operational workloads across departments requires a capital investment in the tens of millions of dollars, dedicated cooling and power infrastructure, specialised ML operations talent, and a deployment timeline of six to twelve months before the first workload runs. The appeal is not primarily about cost savings. It is about control: the ability to audit every step of every agentic process, to know exactly what the compliance review agent did and why, to own the data that flows through operational workflows, and to insulate business-critical processes from a vendor's unilateral pricing or architectural changes.

The strategic question for a large enterprise is whether the total cost of building and operating private inference, amortised over the three-to-five-year hardware lifecycle, is lower than the total cost of remaining on vendor-hosted services at whatever rate the vendor sets over the same period. Today, for most workloads, the vendor-hosted path is cheaper. The question is whether it will remain cheaper once the market-capture phase ends and the vendor's pricing reflects a more mature competitive position. The history of cloud computing provides a reference point: early cloud pricing was aggressive, and the trajectory over the following decade was a steady increase in effective spend for customers who had become operationally dependent on the platform, partly because the prices themselves rose and partly because usage expanded to fill the available surface.

But the cost comparison misses the deeper issue. A large enterprise that routes its procurement approvals, its contract reviews, its compliance checks, and its customer escalation workflows through a vendor's agentic surface has placed operational accountability in a system it does not control and cannot fully inspect. When a regulator asks how a lending decision was made, or when a board asks why a contract clause was missed, "the agent handled it" is not an answer that satisfies anyone with fiduciary responsibility. The enterprises that are thinking clearly about this are not just running cost models. They are asking who is accountable when an opaque chain of inference steps makes a decision that the business has to defend, and whether the vendor's surface gives them enough visibility to answer that question.

The honest assessment is that large enterprises have options, and those options are expensive and slow. The choice is between paying the vendor on terms the vendor sets while accepting limited visibility into operational processes, or investing in infrastructure that provides independence and auditability but requires capital, talent, and a multi-year commitment. Neither path is free, and the right answer depends on the enterprise's scale, regulatory exposure, risk appetite, and tolerance for vendor dependency over the processes that define how the company actually operates.

The developer tool ecosystem

There is a category of affected parties that does not map neatly onto company size, and it deserves separate treatment. The ecosystem of developer tools, IDE plugins, CI/CD integrations, code review assistants, documentation generators, and every other product that assumed cheap programmatic access to frontier models faces the same structural challenge as the third-party harness category, but arrives at it from a different direction.

These tools assumed something simpler than platform dominance: that programmatic model access would remain affordable enough to embed into the development workflow at multiple points. A CI pipeline that runs an AI code reviewer on every pull request, a documentation generator that updates API docs on every merge, a test generator that produces new test cases on every feature branch: each of these integrations assumed that the per-call cost of model inference was low enough to treat as overhead. Under the new pricing, it is not. Each of those calls now draws from a metered credit pool, and the aggregate cost of running an AI-augmented development workflow across a mid-size team becomes a variable that has to be actively managed rather than absorbed into the subscription.

The tools that survive this transition will be the ones that can justify their metered cost in measurable terms, demonstrating enough value per API call to warrant the spend. The tools that cannot, and I suspect there are many, will either retreat to simpler heuristic approaches that do not require model inference, or they will become features of the vendor's first-party harness rather than independent products, absorbed into the same consolidation dynamic that is pulling the rest of the ecosystem inward.

Open-source and local inference

The open-source and local-inference ecosystem is now the marginal substitute for high-volume routine work in a way that it was not two weeks ago. The economics of running a frontier model through the metered path on high-volume routine workloads have become severe enough that the quality-cost frontier of open models is genuinely competitive for a meaningful share of the agentic surface area.

The accurate way to describe what is happening is as a stratification along the workload axis.15The stratification is along the quality-cost frontier: routine, well-defined, repeatable tasks migrate to cheaper models where the quality gap is acceptable, while cognitively demanding tasks stay on frontier models. The result is a rational allocation of workload across providers based on the new cost structure, and it will accelerate as open-model quality continues to improve. High-volume routine work, the kind of agentic task that is well-defined, repeatable, and does not require the frontier model's full reasoning capability, moves to cheaper or local models because the economics now demand it. Cognitively expensive work, long-horizon multi-file reasoning, security-sensitive editing, complex architectural decisions, stays on frontier models because nothing else is good enough yet.

The abstraction tools that make this stratification practical, OpenRouter, LiteLLM, Portkey and similar routing layers, continue to work because they operate on API keys rather than on the subscription path that the vendors have closed.16OpenRouter, LiteLLM, and Portkey are model-routing and abstraction layers that sit between the developer and multiple model providers. They operate on API keys rather than on the subscription path, which means they are not directly affected by the subscription pool changes. They enable workload routing across providers based on cost, latency, or capability criteria. Their continued viability depends on the API path remaining available, which both vendors have preserved for paid API key holders. A developer or team willing to manage the complexity of routing different workloads to different models based on task characteristics can recapture some of the economic flexibility that the vendor consolidation is designed to remove. The trade-off is operational complexity, because maintaining routing logic, evaluating model quality across providers, and managing multiple API relationships is substantial overhead, and most small teams will not find it worth the effort.

The open-source path is viable but conditional. DeepSeek V3, Gemma 4, Qwen 3, and Llama are competitive for routine agentic tasks. The quality gap on the hardest work remains substantial. The practical question for any team is where on the workload spectrum the crossover sits, and whether the operational complexity of managing multiple model paths is worth the economic benefit.


The regulatory horizon

There is a regulatory dimension to this shift that has a longer time horizon than the immediate commercial impact and deserves to be flagged separately.

A price discrimination of fifty to one hundred multiples between subscription and API access, combined with first-party harness preference and the structural weakening of third-party competitors, is exactly the conduct shape that the EU Digital Markets Act is designed to scrutinise in established gatekeepers.17The EU Digital Markets Act (DMA) is designed to prevent self-preferencing, unfair pricing, and anti-competitive bundling by designated gatekeepers. Neither Anthropic nor Google currently meets the gatekeeper thresholds for AI foundation model services, and the European Commission is still adapting the regulatory framework to cover AI-specific market structures. The trajectory of regulatory expansion, however, suggests designation within the next few years. The conduct pattern being established now fits the DMA's target profile. Neither Anthropic nor Google currently meets the DMA gatekeeper thresholds for foundation model services, and the act is still being adapted to AI, but the trajectory is clearly toward designation within a few years, and the conduct that is being normalised now will be the conduct under examination when designation arrives.

Organisations operating in regulated industries, particularly those in jurisdictions where AI sovereignty concerns are politically active, should treat this as a live question rather than as a distant one. The procurement decisions made in 2026 will be the contractual positions defended in 2028 when the regulatory framework catches up to the market structure it is designed to address. The European conversation about digital sovereignty, which has until now focused primarily on cloud infrastructure and data residency, is beginning to extend to the AI inference layer, and the vendor consolidation described in this piece will accelerate that extension.


The official rationale of "headless consumes more tokens" is technically correct and operates as honest cover for a coordinated re-pricing of the entire programmatic surface with three downstream effects neither company is naming openly. It pulls agentic workflow into first-party harnesses where the telemetry has long-term model-improvement value and where the data feedback loops compound across generations. It cuts the economic legs out from under the third-party harness category before it can mature into a platform layer that would commoditise the foundation model. And it rebuilds a customer segmentation curve that the original subscription pricing had collapsed, restoring the relationship between consumption and cost that a flat-rate subsidy had been suppressing.

Google executed the move through product architecture, deprecating the script-friendly CLI in favour of an agent-first surface that is structurally harder to wrap. Anthropic executed it through pricing, leaving the product intact and re-metering the heavy path. Same destination, different lever, same week.

The deeper question that the events of this week pose is whether the agent layer is going to look more like the cloud market of 2010, where competition between hyperscalers remained intense and customers retained meaningful negotiating power even as switching costs accumulated, or more like the mobile platform market of 2015, where two vendors had effectively divided the world between them and the third-party developer ecosystem operated on terms the platforms set. The evidence of the past week points more toward the latter than the former, and the strategic response that an organisation builds today should be calibrated to that destination rather than to the more comfortable one.

The vendors have shown their hand. The cost of not choosing is now visible in a way it was not a month ago.

Conclusion

Choose your stack carefully, because once you are in, it will be increasingly difficult to shift. The switching costs accumulate fast: every workflow you build, every process you wire through a vendor's agentic surface, every policy check and approval chain that runs on their inference becomes a dependency that gets harder to unwind with each passing quarter.

Before you commit, think through what you are actually handing over:

  • Accountability. When an agentic workflow makes a consequential decision inside your business, who is responsible? Can you trace the chain of steps that led to that decision, or is it opaque by design?
  • Privacy and trade secrets. Every document, every email, every contract, every internal communication that flows through the vendor's surface is data the vendor has access to. Your competitive intelligence, your customer data, your negotiation positions, your internal strategy — all of it passes through infrastructure you do not own and cannot audit.
  • Governance. Can your compliance team explain to a regulator how a decision was made when the decision was produced by a chain of inference steps inside a third-party system? Can your legal team satisfy a discovery request when the process that generated the output is controlled by the vendor?
  • Risk. Your operational processes now depend on a third party's pricing decisions, architectural changes, and service continuity. A thirty-day notice can change the cost structure of workflows your business relies on daily.
  • Fluctuating costs. The subsidy era trained organisations to treat AI tooling as a fixed cost. It is now a variable cost that scales with usage, and the vendor sets the rate. Budgeting for processes whose cost you cannot predict and whose internal complexity you cannot see is a risk most finance teams have never had to model.
  • Process dependency. The deeper your workflows are embedded in a vendor's harness, the more your operational continuity depends on decisions made in a boardroom you have no seat in. The vendor's incentives and your organisation's incentives are aligned today. They may not be tomorrow.

The period of artificial market fluidity is over. The vendors have made their moves. The question for every organisation is whether to make a deliberate choice now, while the transition window is still open and the switching costs are still manageable, or to wait until the next round of pricing changes forces the issue on someone else's terms.