Marco Santelli

In this article

What both vendors are actually doing — the four strategic moves behind the announcements. For everyone.
The pricing wedge — how a 50–100× cost differential becomes the primary lock-in mechanism. For finance, FinOps, and procurement leads.
Why MCP is not the answer — the open protocol argument and why it does not hold up. For technical architects and product builders.
Who pays the price, and how — select your segment:
- Product builders and startups — the orchestration thesis collapses overnight
- Mid-size companies — every workflow step gets metered invisibly
- Large enterprises — negotiate, self-host, or absorb the cost
- The developer tool ecosystem — cheap programmatic access disappears
- Open-source and local inference — the marginal substitute for routine work
The regulatory horizon — DMA, price discrimination, and what comes next. For legal, compliance, and executive leadership.
Conclusion — choose your stack carefully. For everyone.

Anthropic's pricing announcement landed on a Tuesday and coincidentally Google's arrived three days later with a similar stand. By the end of the week it was clear that these were not two independent product decisions. Rather, the timing and the specific shape of what each company chose to leave open and what it chose to close, pointed at the same strategic move executed through different levers.

On 13 and 14 May 2026, Anthropic announced that effective 15 June the Claude Agent SDK, the non-interactive claude -p CLI invocation, the Claude Code GitHub Actions integration, and all third-party agents authenticating through Claude subscriptions would move out of the standard subscription rate-limit pool and onto a separate dollar-denominated credit pool: twenty dollars for Pro, one hundred dollars for Max 5x, two hundred dollars for Max 20x, metered at full API list prices, with credits that do not roll over.¹Anthropic, "Claude Code: Agentic Usage and Billing Changes," 13 May 2026. The credit pool applies to the Claude Agent SDK, claude -p non-interactive CLI, Claude Code GitHub Actions integration, and all third-party agents authenticating through Claude subscriptions. Credits are metered at full API list prices and do not roll over between billing cycles. Effective 15 June 2026. Interactive Claude.ai, interactive Claude Code in the terminal, and the Cowork desktop product remain unchanged on the subscription pool.

A few days later, Google published its own announcement: Gemini CLI and the Gemini Code Assist IDE extensions would stop serving Google AI Pro, Ultra, and free Code Assist users on 18 June, replaced by the new Antigravity CLI. Gemini CLI would remain accessible only through paid API keys or the Gemini Enterprise Agent Platform.²Google, "Introducing Antigravity CLI," 19 May 2026. Effective 18 June 2026. Gemini Code Assist Standard and Enterprise licence holders, and Gemini Code Assist for GitHub customers through Google Cloud, retain unchanged access. Consumer-tier users are migrated to the Antigravity CLI, which is built on the Antigravity Shared Agent Engine, a server-side multi-agent runtime.

Do you think it is coincidental that the announcement is done in the same timespan and that the change will apply on both in the same week in June?

The official rationale offered for both moves is sensible and partly correct:

Anthropic's rationale, articulated by Boris Cherny in interviews and on social media, is that third-party tools operating outside the prompt cache system are difficult to support sustainably, that subscription pricing was never designed for continuous automated demand, and that some users were running token consumption worth hundreds or thousands of dollars on twenty-dollar plans through tools like OpenClaw and similar third-party harnesses.³Boris Cherny, Anthropic product lead for Claude Code, in interviews and social media posts accompanying the announcement. The specific example cited was users proxying agentic workloads through twenty-dollar Pro subscriptions via third-party harnesses like OpenClaw, generating token consumption worth hundreds or thousands of dollars at API list prices. The prompt cache system that makes interactive sessions economically viable does not cover headless agentic sessions orchestrated by external tools.

Google's rationale skirts the pricing question entirely, presenting the change as a product evolution: workflows had outgrown a single-agent terminal CLI and now required multi-agent orchestration with a unified backend.

Both rationales are factually accurate as far as they go. And are also doing strategic work that the public communications are careful not to name. What I want to walk through in this piece is what these two announcements actually mean when you read them together, why the popular counterarguments to the lock-in concern do not hold up, and what the consequences look like for the different kinds of organisations and builders who now have to decide what to do about it.

What I would urge anyone reading this, particularly those making infrastructure and process decisions for organisations rather than individual choices for themselves, is to recognise that the period of artificial market fluidity that characterised the first eighteen months of agentic AI has ended. The choices are still real, the open layers still exist, and the open-source models are still available. What has gone away is the subsidy that was making certain choices artificially cheap, and the comfortable assumption that one could defer the strategic question of vendor commitment indefinitely while the market sorted itself out. The market has sorted itself out. The vendors have chosen. And the evidence below should help clarify what that means for your specific situation.

What both vendors are actually doing

re-pricingMoving headless agentic usage from flat-rate subscription to metered API list prices, aligning cost with consumptionfirst-party captureFunnelling workflows into the vendor's own harness — the only subsidised path — to accumulate behavioural telemetrythird-party suffocationRemoving the economic viability of independent orchestration tools by ending flat-rate programmatic accesscustomer segmentationRebuilding the monetisation curve: cheap interactive, metered programmatic, enterprise API contracts

1The first move, common to both companies, is a re-pricing of the heaviest workload at something closer to its true marginal cost. Headless agentic usage is by a substantial margin the largest per-user token consumer in any AI vendor's product mix. It bypasses the prompt cache optimisations that make interactive sessions economic to serve, and pricing it at the same flat rate as a chat-grade interactive workload was a transitional subsidy that could not survive sustained agentic demand. Re-pricing it at API list rates, or moving it off the interactive surface entirely, is the correct economic response in isolation, and it would be cynical to pretend the capacity argument has no weight given the public data about agentic demand outpacing GPU supply.

2The second move, less openly discussed, is the deliberate capture of agentic workflow inside first-party surfaces. Both companies are leaving exactly one path wide open and closing every other one. For Anthropic, that path is interactive Claude Code running in the terminal, which stays on the unchanged subscription pool. For Google, that path is Antigravity 2.0 and its CLI, a unified, server-side, multi-agent harness designed around the Antigravity Shared Agent Engine. The economic effect is that any developer who wants high-volume agentic work without paying API rates has one rational choice: live inside the vendor's own harness.

The reason this matters more than the immediate inference revenue is that every accepted suggestion, every rejected diff, every plan-mode exploration, and every tool call inside the first-party harness becomes a behavioural signal that feeds the next model generation's reinforcement learning loop, and that signal is worth more to long-term competitive position than the inference revenue it displaces.⁴The telemetry argument: every interaction inside the first-party harness, including accepted and rejected code suggestions, plan-mode explorations, tool call patterns, and sub-agent configurations, generates behavioural data that feeds the vendor's reinforcement learning pipeline for the next model generation. This data is not available to the vendor from third-party harness usage, even when the underlying model is the same. The competitive value of this signal accumulates across model generations and compounds over time.

3The third move, and the one I find genuinely existential for the broader ecosystem, is the pre-emptive suffocation of the third-party harness category before it becomes the platform layer. Through 2025 and into early 2026 a credible thesis was building that the durable value in the AI stack would migrate up from the foundation model to the orchestration and harness layer, embodied by projects like OpenClaw, Goose, Conductor, Zed's agent mode, Jean, and a long tail of open-source frameworks.⁵The third-party harness and orchestration category includes OpenClaw, Goose (Block), Conductor, Zed's agent mode, Jean, and numerous open-source frameworks. The shared thesis: if the orchestration layer abstracts model choice, the foundation model commoditises, and the harness owner captures customer relationships, integrations, and workflow lock-in. This thesis was credible through early 2026 and had attracted significant venture investment.

The logic was straightforward: a harness that abstracts model choice commoditises the foundation model underneath, and the harness owner accumulates the customer relationships, integrations, and workflow lock-in that the model vendor cannot easily disintermediate. By pulling the subscription subsidy in the same week, both vendors make the first-party harness the only economically rational choice for any serious volume of agentic work. The third-party harness market freezes in place before it can build distribution moats.

4The fourth move is a customer segmentation that the previous pricing structure was actively preventing. Under the old regime, a single twenty-dollar Pro subscription could absorb the token consumption of what was effectively a small engineering team's worth of automated work, keeping the monetisation curve flat at exactly the segment that most needed differentiation. The new structure rebuilds that curve: consumer interactive usage stays cheap and capped, programmatic usage moves onto a metered credit that exposes true cost to the user, and the upper tier of headless workloads is reserved for enterprise API contracts or paid API key consumption. Google's segmentation is even more explicit. Gemini Code Assist Standard and Enterprise licence holders, and Code Assist for GitHub customers through Google Cloud, retain unchanged access, while consumer-tier users are pushed off the headless surface entirely.⁶Google's segmentation is explicit in the announcement: Gemini Code Assist Standard and Enterprise licence holders retain access to Gemini CLI. Consumer-tier users (Google AI Pro, Ultra, free Code Assist) are migrated to Antigravity CLI. The effect is a clean separation between enterprise customers on negotiated contracts and consumer customers on list pricing, with the consumer tier bearing the full cost of the transition.

Taken together, the four moves mark the end of the flat-rate subscription era for agentic AI and the beginning of a deliberate differentiation strategy. Each vendor is now constructing a first-party agent surface optimised for its own model, accumulating telemetry that only its own model benefits from, and training its users into workflows that do not transfer cleanly across vendor boundaries. The era of "the agent is just a wrapper over any model" is ending, and the era of "the agent is the vendor's product and the model is its engine" is beginning.

The era of "the agent is just a wrapper over any model" is ending, and the era of "the agent is the vendor's product and the model is its engine" is beginning.

The pricing wedge

pricing wedgeThe 50–100× cost differential between first-party subscription and API list rates — the primary mechanism of vendor lock-inprice discriminationCharging different rates for identical inference based on whether the customer uses the vendor's harness or a third-party toolsubsidy erosionThe risk that today's aggressive subscription pricing will rise once the vendor has captured sufficient market share

The framing of this shift as vendor lock-in deserves to be made precise, because the technical conversation tends to treat lock-in as a binary state rather than as an economic gradient with a specific mechanism. The mechanism here is price discrimination, and the magnitudes are stark enough that the word lock-in is justified despite the absence of any technical incompatibility between vendors.

A developer running high-volume agentic work inside a Claude Code Max 20x subscription pays two hundred dollars a month for what generates effective token consumption in the range of several thousand dollars at API list prices. A developer running the same workload through a third-party orchestrator paying API rates, even at the relatively favourable blended rate of around one and a half dollars per million tokens that some routers offer, can easily accumulate seven hundred dollars per day on a serious autonomous coding workload, which annualises to a quarter of a million dollars per developer.⁷Estimate based on publicly available API pricing and reported workload patterns. Max 20x subscription: $200/month, $2,400/year. API path at blended rate of approximately $1.50 per million tokens: sustained high-volume agentic coding workloads have been reported to consume token volumes that translate to approximately $700/day, annualising to roughly $250,000 per developer. The 50–100× differential is the range across different workload intensities and blended rates.

The differential between the first-party subscription path and the third-party orchestrator path is a fifty to one hundred multiple, large enough that no third-party orchestrator can absorb it regardless of how superior its product is on quality, ergonomics, or integration breadth. The lock-in is enforced by a deliberate pricing wedge that the vendors set and adjust at will, with switching cost and technical incompatibility playing supporting roles to the price differential rather than carrying the weight themselves.

There is an important secondary point that follows from this. The lock-in is conditional on the vendors maintaining the current subsidy on their interactive surfaces, and the subsidy is rationally vulnerable to being raised once the market has been sufficiently captured. The current pricing reflects an aggressive market-share posture rather than a stable equilibrium, and any business strategy that assumes today's prices will hold next year is making an assumption about vendor strategic restraint that the history of comparable platform moves does not support. The two hundred dollar Max 20x of today becomes the four hundred dollar tier of next year, and the businesses and teams that depend on the subsidy continuing at its current level are taking a position on vendor pricing rather than on technology choice.

Why MCP is not the answer to this

MCPModel Context Protocol — an open standard for how models invoke tools and exchange context, adopted by Anthropic, Google, and OpenAIprotocol vs gateMCP standardises the integration layer, but the vendor controls which tools get invoked and under what commercial termsmonetisation asymmetryMCP servers are passive and invisible to the user — they cannot extract rent, making standalone MCP businesses unviable

The Model Context Protocol, which Anthropic published as an open standard and Google has adopted across the Antigravity stack alongside OpenAI's adoption in their Agents SDK, looks at first glance like the counterargument to the lock-in framing.⁸The Model Context Protocol (MCP) was published by Anthropic as an open specification in late 2024. Google adopted MCP across the Antigravity stack and announced managed MCP servers through Apigee at Cloud Next 26. OpenAI integrated MCP support into the Agents SDK. The protocol is now the de facto standard for tool integration across all three major vendors. If the integration layer between models and tools is genuinely open and cross-vendor, then surely the lock-in concern is overstated, and the third-party tool ecosystem can survive by sitting on top of MCP and serving whichever harness the customer happens to be using. This is the most common rejoinder I hear when raising the lock-in question, and I think it is wrong in a specific way that is worth walking through.

MCP is open at the protocol level and strategically inert at the economic level, and the two facts coexist without contradiction because they operate at different layers of the stack. MCP describes how tools are invoked by a model and how they communicate context back, and it does this in a way that is genuinely portable across vendors. What MCP does not describe, and what it cannot address through any technical means, is who decides which tool gets invoked and under what economic conditions. That decision sits with the model running inside the harness, the harness sits inside the vendor's commercial envelope, and the vendor sets the pricing wedge between using its harness and using a competitor's. The openness of MCP is genuine, but it operates above the gate, not at the gate.

There is a further problem specific to MCP that shapes what businesses can be built on it. MCP servers are invoked by the model at the model's discretion. They cannot communicate at will, they cannot solicit user attention, they cannot initiate interactions, and they have no agency in the conversation beyond responding when the model decides to call them.⁹MCP servers are invoked by the model at the model's discretion and cannot initiate interactions, solicit user attention, or communicate at will. Server-initiated notifications and sampling (requesting completions back) exist in the protocol specification but are housekeeping channels rather than commercial primitives. The user pays one price for the inference session; the MCP provider is invisible inside that price. The marginal cost of building an MCP server with AI assistance has dropped to roughly half a day of engineering work, while marginal revenue per invocation remains zero. This asymmetry makes standalone MCP monetisation structurally unviable. The user perceives a single price for the inference session, the MCP provider is invisible inside that price, and the provider therefore cannot extract rent at the call site.

The viable monetisation paths reduce to two categories: wrappers around services that are already independently monetised, like the official Stripe, Salesforce, or GitHub Enterprise MCP servers, where the protocol is an access path to value the customer already pays for; and internal corporate integrations that organisations build for themselves and never sell. The category that does not work is the standalone MCP-as-a-service business, because the marginal cost of producing an MCP server using AI assistance has collapsed to roughly half a day of work while the marginal revenue per call remains structurally zero.

Layered on top of the monetisation difficulty is the vendor's ability to throttle, bias, or replace any MCP that becomes commercially interesting. The vendor controls invocation through the model. The vendor can fine-tune the model to prefer first-party tools over functionally equivalent third-party MCPs. The vendor can build paid first-party features that occupy the same functional niche as a popular free MCP and route the value through its own billing rather than through the ecosystem.¹⁰The pattern of first-party integration displacing third-party MCP providers is visible across several categories: spreadsheet interaction, web browsing, calendar management, code review, and document handling. In each case, the vendor ships a built-in integration that occupies the same functional niche as a popular community MCP server, routing the value through its own billing and feature set. The effect is systematic enclosure of the integration surface, with the boundary between first-party and third-party territory set by the vendor's roadmap. The pattern is already visible across spreadsheets, browsers, calendars, code review, and document handling, with first-party integrations being shipped at exactly the points where the MCP ecosystem might otherwise have monetised.

MCP is open at the protocol level and strategically inert at the economic level, and the two facts coexist without contradiction because they operate at different layers of the stack.

This is the smoke-in-the-mirror dynamic. MCP is genuinely useful infrastructure and the protocol does meaningful portability work at the integration layer, particularly for internal corporate use cases that remain its dominant application. What it does not do, and what it is sometimes implicitly claimed to do, is undermine vendor lock-in or threaten the strategic position of the harness layer that sits above it. The protocol is open and the gate above it is closed, and the openness of the one does not affect the closure of the other.

The picture that emerges when you assemble these pieces is that the agent layer of the AI stack closed materially in a single week of May 2026, and the closure has consequences that differ sharply depending on where you sit. The implications for a two-person startup building an agent product are not the same as the implications for a mid-size company that has spent the last year weaving AI into its internal workflows, and neither of those is the same as what a large enterprise with procurement leverage and infrastructure budget faces. What follows is a walk through each of these positions, because I think the differences matter at least as much as the general trend.

Segment	Exposure	Options	Lock-in risk
Startups	Existential — product economics broken by the pricing wedge	Pivot to open-source models, own a deep domain, or negotiate enterprise API terms	Survival risk highest, but switching cost lowest
Mid-size companies	Severe — metered billing inverts automation ROI, no leverage to negotiate	Accept vendor terms, constrain automation to the subscription pool, absorb routing complexity	Highest — already invested, cannot negotiate, cannot self-host
Large enterprises	Significant — long-cycle commitment, but with negotiating power	Volume pricing agreements, private inference, multi-vendor as negotiating position	Moderate — options exist but all require capital
Developer tools	High — products assumed cheap programmatic access	Prove per-call ROI or become features of the vendor's harness	High — economic viability depends on vendor pricing decisions
Open-source path	Opportunity — becomes the marginal substitute for routine work	Host locally or route through abstraction layers at API rates	Low for routine tasks, does not cover frontier reasoning

Startups Mid-size Enterprise Dev tools Open-source

Product builders and startups

orchestration thesisThe 2024–2025 VC thesis that durable value would sit in the harness/orchestration layer, commoditising foundation models underneathmargin compressionEach step in an agentic chain incurs inference cost — as tasks grow complex, per-transaction cost outpaces what customers will paycompounding errorProbabilistic steps in a chain multiply failure rates — 95% accuracy per step yields ~70% end-to-end over seven steps

Through 2024 and 2025, venture capital flowed into the agent and orchestration layer on the thesis that the harness would become the durable value capture point in the AI stack. Startups building coding assistants, workflow automation tools, AI-powered CI/CD pipelines, and multi-agent frameworks were all operating on the same structural assumption: that cheap subscription tokens would remain available for programmatic use, and that the startup could differentiate on orchestration quality, domain expertise, or developer experience while the foundation model underneath became increasingly commoditised.¹¹The agent and orchestration startup wave of 2024–2025 was built on the assumption that foundation model access would remain cheap enough to embed into third-party products at scale. Funding rounds across the category reflected this thesis: orchestration-layer companies, multi-agent framework builders, and AI-native developer tools raised on valuations that assumed sustainable unit economics at subscription-equivalent rates. The pricing changes of May 2026 invalidate the economic premise without changing the product quality.

That assumption died in a week. The pricing wedge means that any product built on top of a vendor's model through the API path now faces a cost structure that the subscription-path competitor, the vendor's own first-party harness, will always undercut by a factor of fifty to a hundred. The startup cannot match the price. It cannot absorb the differential. And it cannot pass the cost to its customers without making its product dramatically more expensive than the vendor's own offering for what appears, from the buyer's seat, to be similar functionality.

The survivors, as I see it, will fall into three categories.

Negotiated API economics. Companies that had already built enterprise API relationships with direct pricing agreements, where the economics are negotiated rather than list-rate.
Open-source and local inference. Companies that pivot to open-source and locally-hosted models, accepting the quality trade-off on the hardest tasks in exchange for economic independence from the vendor's pricing lever.
Deep domain ownership. Companies that own a domain narrow enough and deep enough that the vendor is unlikely to enter it directly, where the product's value lives in the domain expertise rather than in the orchestration, and where the foundation model is a component rather than the product.

Everyone else is holding a product whose unit economics just changed by two orders of magnitude, and the change is not temporary.

But the cost problem is only the first layer. Step back and ask what the startup is actually offering its customers. The product is a service that takes an input, runs it through a sequence of agentic steps, and delivers an output. The customer pays a fee per transaction or a monthly subscription, and the startup's margin is the difference between that fee and the cost of the inference chain underneath. Under the old subsidy, those chains were cheap enough that the margin arithmetic worked. Under API list rates, it does not, and it gets worse as complexity grows.

Take a concrete example. A startup sells an automated customer support product. A single inbound email triggers an agent that reads the message, retrieves the customer's history, classifies the intent, drafts a response, checks it against policy, and sends it. That is six or seven inference calls at minimum, and each one runs at API list price. If the blended cost per call is even a few cents, and the customer is paying five or ten dollars a month for the service, the startup is losing money on every active user. The product that looked like a high-margin SaaS business under the subsidy turns out to be a premium service with shrinking or negative margins at scale.¹²The margin compression is structural rather than operational. Each step in an agentic chain incurs an inference cost that the startup cannot reduce without degrading the product. The number of steps is determined by the complexity of the task, not by the startup's engineering. As tasks grow more complex, the cost per transaction increases faster than the price the customer is willing to pay, creating a negative margin curve at exactly the point where the product should be scaling.

And that is before you account for the quality problem, which compounds alongside the cost. I dissected this in detail in When autonomous agents are the wrong answer, but the short version is that each step in an agentic chain is probabilistic. The model might misclassify the intent, retrieve the wrong context, draft a response that misreads the tone, or fail a policy check that a human would have passed. The error rate at each step may be low, but errors compound across a chain: a sequence of seven steps each operating at ninety-five percent accuracy delivers a correct end-to-end result only seventy percent of the time. The startup is selling a product that costs more per transaction than the customer expects to pay, and that produces the wrong output roughly a third of the time on non-trivial tasks, and the customer has no visibility into how many steps ran, what they cost, or where the errors were introduced, because the chain is opaque by design.

This is what the startup is actually building: a premium-priced, probabilistic workflow where the margins decrease as usage grows and the quality degrades as complexity increases, all running on infrastructure whose pricing is set unilaterally by a vendor who also competes with the startup's product directly.

The structural problem for startups: The vendor's first-party harness will always be subsidised below the cost that any third-party product built on API rates can match. This is not a gap that better engineering or a superior user experience can close, because the gap is in the pricing, not in the product. And the product itself, a probabilistic chain of opaque inference steps with compounding error rates and no customer-visible cost breakdown, is a harder sell than the demo ever suggested.

Mid-size companies and the metering trap

metering trapThe inversion where automation ROI turns negative — the more a company automates, the more it pays under metered billingopaque agent chainsSequences of inference steps inside the vendor's architecture that the customer cannot see, audit, or simplifyper-seat → per-tokenThe shift from fixed subscription costs to variable usage-based billing that scales with workflow complexity

The segment I find most exposed by this shift is the mid-size company, roughly one hundred to a few thousand employees, that spent the past two years weaving AI into its operational fabric: customer service workflows, procurement approvals, compliance checks, HR onboarding, contract review, internal policy enforcement, and the dozens of cross-departmental processes that keep a company running day to day.

These companies face a specific problem that the startup does not share and that the large enterprise can absorb. Under the previous pricing, AI tooling was a per-seat subscription cost: predictable, budgetable, and indifferent to how intensively any department used the tool. A company running agentic workflows across operations, finance, and customer support paid a known monthly line item per seat, and the more those teams automated through the tool, the better the return on that line item looked. Automation made the subscription more valuable, not more expensive.

Under the new pricing, the moment any of those workflows crosses into the programmatic surface, into automated approvals, into policy-checking agents, into any of the headless patterns that the most operationally ambitious companies have built over the past year, the cost model inverts.¹²Under subscription pricing, the cost of AI tooling was a fixed per-seat expense indifferent to usage intensity. Under metered credit pricing, the cost scales with the volume of programmatic work. For companies that had moved significant operational workflow into automated pipelines, the transition from per-seat to per-token billing can multiply the effective cost by an order of magnitude or more, depending on automation intensity. What was a fixed per-seat cost becomes a usage-based cost that scales with the volume of automated work. The more a company automates, the more it pays. The return-on-investment calculation for automation, which was straightforward under the subscription model, now contains a variable cost component that grows with the automation's own success.

To see why this matters beyond the billing page, consider a workflow that most companies will recognise. An email arrives from a customer. An agent reads it, parses the intent, searches for related context across the company's systems, evaluates what the customer needs, and prepares a handoff for the next agent in the chain. That second agent checks the account history, reviews the relevant policy, drafts a response, and passes it to a third agent for tone and compliance review before a visible action is taken, perhaps a reply, perhaps a support ticket, perhaps an escalation to a person. Between the email arriving and the response leaving, a dozen or more internal steps ran, each consuming tokens, each metered, and each probabilistic rather than deterministic in a process the business treats as reliable (I wrote about this tension in The friction between probabilistic and deterministic). The customer saw one interaction. The company, depending on the vendor's billing granularity, may or may not see the full chain underneath. In a subscription world, the cost of that chain was folded into a flat monthly fee and nobody thought about it. In a metered world, the cost scales with the number of internal loops the vendor's agent architecture runs, and that architecture is something the customer did not design, cannot inspect, and has no way to simplify.

This is where the shift stops being a pricing adjustment and becomes a business model problem. These companies budgeted for SaaS, a known cost per seat per month, and they are now exposed to a cost curve that scales with the complexity of processes they cannot see into. The CFO who approved the per-seat subscription understood the number. The metered credit that replaces it for programmatic workloads introduces a variable that has to be modelled, monitored, and capped, and the institutional capacity to do that does not exist in most mid-size companies because they have never needed it before.

The lock-in compounds the problem. A mid-size company lacks the scale to negotiate enterprise pricing agreements directly with Anthropic or Google. It lacks the infrastructure budget and talent to build private inference capability. And it has already accumulated switching costs in the form of agent configurations wired into procurement workflows, compliance review chains, customer service routing, onboarding sequences, and institutional knowledge built around a specific vendor's harness, all of which would need to be rebuilt if it moved. The practical result is that these companies are captive to the vendor's pricing decisions in a way that neither startups, who can pivot to open-source models before the switching costs accumulate, nor large enterprises, who have procurement leverage, are captive.

Per-seat → per-token

The shift from subscription to metered credit for programmatic workloads inverts the ROI of automation: the more a company automates, the more it pays

Large enterprises and the hardware question

operational accountabilityWho is responsible when an agentic workflow makes a consequential decision inside a business process the organisation cannot inspect?private inferenceRunning open-source models on dedicated hardware — provides auditability and control but requires tens of millions in capitalvolume pricingNegotiated enterprise contracts that reduce the pricing wedge but do not eliminate dependency on the vendor's architecture

Large enterprises sit in a structurally different position, and the difference is worth being precise about rather than treating all companies as uniformly affected. They have options. But those options touch every part of the organisation, not just the technology department, and the decisions required are as much about governance, accountability, and operational risk as they are about infrastructure.

A company with tens of thousands of employees across multiple offices, departments, and geographies has by now woven AI into processes that span the entire business: legal review, contract management, procurement approvals, HR policy enforcement, regulatory compliance, customer operations, financial reporting, and internal communications. Engineering is one department among many. The exposure to this pricing shift reaches well beyond the IT budget. It raises a question about how the company runs, who is accountable when an agentic workflow makes a consequential decision inside a business process, and what happens when the cost of those workflows changes by an order of magnitude on thirty days' notice.

The first option is negotiated volume pricing. A large enterprise with an established cloud procurement function can negotiate directly with Anthropic or Google at rates that sit between the subscription subsidy and the list API price. The enterprise commits to volume, the vendor discounts the rate, and both sides get a number they can budget around. The enterprise does not face the same fifty-to-one-hundred-times pricing wedge that a smaller customer on list rates does, because the enterprise has something the vendor wants: a large, predictable commitment that justifies discounting. But a negotiated rate does not resolve the governance problem. The enterprise still cannot see inside the agent chains running across its departments, still cannot audit how many inference steps a compliance review or a contract analysis actually consumed, and still depends on the vendor's architecture decisions for processes that carry real legal and financial accountability.¹³Enterprise volume pricing agreements for AI inference follow the same pattern as cloud infrastructure procurement: committed spend in exchange for discounted rates. These agreements reduce the pricing wedge but do not eliminate the structural dependency on the vendor's architecture, telemetry practices, or billing granularity. The enterprise negotiates the rate but not the metering methodology, which means the vendor retains control over how consumption is measured and reported.

The second option, and the one I hear discussed more frequently since the announcements, is private inference. The open-source model landscape has matured to the point where running DeepSeek V3, Gemma 4, Qwen 3, or Llama on dedicated hardware is viable for a meaningful share of operational workloads, particularly for routine, high-volume tasks where the quality gap between open models and frontier models is narrow enough to be acceptable.¹⁴As of May 2026, the open-source model landscape includes DeepSeek V3, Gemma 4 (Google), Qwen 3 (Alibaba), and Meta's Llama family. On routine structured tasks, the quality gap between these models and frontier models has narrowed substantially. On long-horizon reasoning, complex multi-step decision chains, and tasks requiring nuanced judgement, frontier models retain meaningful advantages. The crossover point varies by task type and acceptable error rate. The economics of this path are real but not trivial. A GPU cluster capable of serving an enterprise's operational workloads across departments requires a capital investment in the tens of millions of dollars, dedicated cooling and power infrastructure, specialised ML operations talent, and a deployment timeline of six to twelve months before the first workload runs. The appeal is not primarily about cost savings. It is about control: the ability to audit every step of every agentic process, to know exactly what the compliance review agent did and why, to own the data that flows through operational workflows, and to insulate business-critical processes from a vendor's unilateral pricing or architectural changes.

The strategic question for a large enterprise is whether the total cost of building and operating private inference, amortised over the three-to-five-year hardware lifecycle, is lower than the total cost of remaining on vendor-hosted services at whatever rate the vendor sets over the same period. Today, for most workloads, the vendor-hosted path is cheaper. The question is whether it will remain cheaper once the market-capture phase ends and the vendor's pricing reflects a more mature competitive position. The history of cloud computing provides a reference point: early cloud pricing was aggressive, and the trajectory over the following decade was a steady increase in effective spend for customers who had become operationally dependent on the platform, partly because the prices themselves rose and partly because usage expanded to fill the available surface.

But the cost comparison misses the deeper issue. A large enterprise that routes its procurement approvals, its contract reviews, its compliance checks, and its customer escalation workflows through a vendor's agentic surface has placed operational accountability in a system it does not control and cannot fully inspect. When a regulator asks how a lending decision was made, or when a board asks why a contract clause was missed, "the agent handled it" is not an answer that satisfies anyone with fiduciary responsibility. The enterprises that are thinking clearly about this are not just running cost models. They are asking who is accountable when an opaque chain of inference steps makes a decision that the business has to defend, and whether the vendor's surface gives them enough visibility to answer that question.

The honest assessment is that large enterprises have options, and those options are expensive and slow. The choice is between paying the vendor on terms the vendor sets while accepting limited visibility into operational processes, or investing in infrastructure that provides independence and auditability but requires capital, talent, and a multi-year commitment. Neither path is free, and the right answer depends on the enterprise's scale, regulatory exposure, risk appetite, and tolerance for vendor dependency over the processes that define how the company actually operates.

The developer tool ecosystem

per-call ROIUnder metered billing, every API call must justify its cost — tools that cannot demonstrate measurable value per invocation lose viabilityconsolidationIndependent tools get absorbed into the vendor's first-party harness as features rather than surviving as standalone products

There is a category of affected parties that does not map neatly onto company size, and it deserves separate treatment. The ecosystem of developer tools, IDE plugins, CI/CD integrations, code review assistants, documentation generators, and every other product that assumed cheap programmatic access to frontier models faces the same structural challenge as the third-party harness category, but arrives at it from a different direction.

These tools assumed something simpler than platform dominance: that programmatic model access would remain affordable enough to embed into the development workflow at multiple points. A CI pipeline that runs an AI code reviewer on every pull request, a documentation generator that updates API docs on every merge, a test generator that produces new test cases on every feature branch: each of these integrations assumed that the per-call cost of model inference was low enough to treat as overhead. Under the new pricing, it is not. Each of those calls now draws from a metered credit pool, and the aggregate cost of running an AI-augmented development workflow across a mid-size team becomes a variable that has to be actively managed rather than absorbed into the subscription.

The tools that survive this transition will be the ones that can justify their metered cost in measurable terms, demonstrating enough value per API call to warrant the spend. The tools that cannot, and I suspect there are many, will either retreat to simpler heuristic approaches that do not require model inference, or they will become features of the vendor's first-party harness rather than independent products, absorbed into the same consolidation dynamic that is pulling the rest of the ecosystem inward.

Open-source and local inference

workload stratificationRoutine high-volume tasks migrate to cheaper open models; cognitively demanding work stays on frontier modelsquality-cost frontierThe trade-off boundary where open-source model quality is acceptable for the task given the cost savings over frontier inferencerouting layersOpenRouter, LiteLLM, Portkey — abstraction tools that direct workloads to different models based on cost, latency, or capability

The open-source and local-inference ecosystem is now the marginal substitute for high-volume routine work in a way that it was not two weeks ago. The economics of running a frontier model through the metered path on high-volume routine workloads have become severe enough that the quality-cost frontier of open models is genuinely competitive for a meaningful share of the agentic surface area.

The accurate way to describe what is happening is as a stratification along the workload axis.¹⁵The stratification is along the quality-cost frontier: routine, well-defined, repeatable tasks migrate to cheaper models where the quality gap is acceptable, while cognitively demanding tasks stay on frontier models. The result is a rational allocation of workload across providers based on the new cost structure, and it will accelerate as open-model quality continues to improve. High-volume routine work, the kind of agentic task that is well-defined, repeatable, and does not require the frontier model's full reasoning capability, moves to cheaper or local models because the economics now demand it. Cognitively expensive work, long-horizon multi-file reasoning, security-sensitive editing, complex architectural decisions, stays on frontier models because nothing else is good enough yet.

The abstraction tools that make this stratification practical, OpenRouter, LiteLLM, Portkey and similar routing layers, continue to work because they operate on API keys rather than on the subscription path that the vendors have closed.¹⁶OpenRouter, LiteLLM, and Portkey are model-routing and abstraction layers that sit between the developer and multiple model providers. They operate on API keys rather than on the subscription path, which means they are not directly affected by the subscription pool changes. They enable workload routing across providers based on cost, latency, or capability criteria. Their continued viability depends on the API path remaining available, which both vendors have preserved for paid API key holders. A developer or team willing to manage the complexity of routing different workloads to different models based on task characteristics can recapture some of the economic flexibility that the vendor consolidation is designed to remove. The trade-off is operational complexity, because maintaining routing logic, evaluating model quality across providers, and managing multiple API relationships is substantial overhead, and most small teams will not find it worth the effort.

The open-source path is viable but conditional. DeepSeek V3, Gemma 4, Qwen 3, and Llama are competitive for routine agentic tasks. The quality gap on the hardest work remains substantial. The practical question for any team is where on the workload spectrum the crossover sits, and whether the operational complexity of managing multiple model paths is worth the economic benefit.

The regulatory horizon

EU Digital Markets ActEU regulation targeting self-preferencing, unfair pricing, and anti-competitive bundling by designated platform gatekeepersgatekeeper designationNeither Anthropic nor Google yet meets DMA thresholds for AI, but the trajectory points toward designation within a few yearsAI sovereigntyThe political push for jurisdictional control over AI inference — extending digital sovereignty from cloud infrastructure to the model layer

There is a regulatory dimension to this shift that has a longer time horizon than the immediate commercial impact and deserves to be flagged separately.

A price discrimination of fifty to one hundred multiples between subscription and API access, combined with first-party harness preference and the structural weakening of third-party competitors, is exactly the conduct shape that the EU Digital Markets Act is designed to scrutinise in established gatekeepers.¹⁷The EU Digital Markets Act (DMA) is designed to prevent self-preferencing, unfair pricing, and anti-competitive bundling by designated gatekeepers. Neither Anthropic nor Google currently meets the gatekeeper thresholds for AI foundation model services, and the European Commission is still adapting the regulatory framework to cover AI-specific market structures. The trajectory of regulatory expansion, however, suggests designation within the next few years. The conduct pattern being established now fits the DMA's target profile. Neither Anthropic nor Google currently meets the DMA gatekeeper thresholds for foundation model services, and the act is still being adapted to AI, but the trajectory is clearly toward designation within a few years, and the conduct that is being normalised now will be the conduct under examination when designation arrives.

Organisations operating in regulated industries, particularly those in jurisdictions where AI sovereignty concerns are politically active, should treat this as a live question rather than as a distant one. The procurement decisions made in 2026 will be the contractual positions defended in 2028 when the regulatory framework catches up to the market structure it is designed to address. The European conversation about digital sovereignty, which has until now focused primarily on cloud infrastructure and data residency, is beginning to extend to the AI inference layer, and the vendor consolidation described in this piece will accelerate that extension.

The official rationale of "headless consumes more tokens" is technically correct and operates as honest cover for a coordinated re-pricing of the entire programmatic surface with three downstream effects neither company is naming openly. It pulls agentic workflow into first-party harnesses where the telemetry has long-term model-improvement value and where the data feedback loops compound across generations. It cuts the economic legs out from under the third-party harness category before it can mature into a platform layer that would commoditise the foundation model. And it rebuilds a customer segmentation curve that the original subscription pricing had collapsed, restoring the relationship between consumption and cost that a flat-rate subsidy had been suppressing.

Google executed the move through product architecture, deprecating the script-friendly CLI in favour of an agent-first surface that is structurally harder to wrap. Anthropic executed it through pricing, leaving the product intact and re-metering the heavy path. Same destination, different lever, same week.

The deeper question that the events of this week pose is whether the agent layer is going to look more like the cloud market of 2010, where competition between hyperscalers remained intense and customers retained meaningful negotiating power even as switching costs accumulated, or more like the mobile platform market of 2015, where two vendors had effectively divided the world between them and the third-party developer ecosystem operated on terms the platforms set. The evidence of the past week points more toward the latter than the former, and the strategic response that an organisation builds today should be calibrated to that destination rather than to the more comfortable one.

The vendors have shown their hand. The cost of not choosing is now visible in a way it was not a month ago.

Conclusion

Choose your stack carefully, because once you are in, it will be increasingly difficult to shift. The switching costs accumulate fast: every workflow you build, every process you wire through a vendor's agentic surface, every policy check and approval chain that runs on their inference becomes a dependency that gets harder to unwind with each passing quarter.

Before you commit, think through what you are actually handing over:

Accountability. When an agentic workflow makes a consequential decision inside your business, who is responsible? Can you trace the chain of steps that led to that decision, or is it opaque by design?
Privacy and trade secrets. Every document, every email, every contract, every internal communication that flows through the vendor's surface is data the vendor has access to. Your competitive intelligence, your customer data, your negotiation positions, your internal strategy — all of it passes through infrastructure you do not own and cannot audit.
Governance. Can your compliance team explain to a regulator how a decision was made when the decision was produced by a chain of inference steps inside a third-party system? Can your legal team satisfy a discovery request when the process that generated the output is controlled by the vendor?
Risk. Your operational processes now depend on a third party's pricing decisions, architectural changes, and service continuity. A thirty-day notice can change the cost structure of workflows your business relies on daily.
Fluctuating costs. The subsidy era trained organisations to treat AI tooling as a fixed cost. It is now a variable cost that scales with usage, and the vendor sets the rate. Budgeting for processes whose cost you cannot predict and whose internal complexity you cannot see is a risk most finance teams have never had to model.
Process dependency. The deeper your workflows are embedded in a vendor's harness, the more your operational continuity depends on decisions made in a boardroom you have no seat in. The vendor's incentives and your organisation's incentives are aligned today. They may not be tomorrow.

The period of artificial market fluidity is over. The vendors have made their moves. The question for every organisation is whether to make a deliberate choice now, while the transition window is still open and the switching costs are still manageable, or to wait until the next round of pricing changes forces the issue on someone else's terms.

The week the agent layer closed

What both vendors are actually doing

The pricing wedge

Why MCP is not the answer to this

Product builders and startups

Mid-size companies and the metering trap

Large enterprises and the hardware question

The developer tool ecosystem

Open-source and local inference

The regulatory horizon

Conclusion

Related