Sundar Pichai told Cloud Next '26 on Wednesday that Google's first-party models now process more than 16 billion tokens per minute through customer API calls, up from 10 billion last quarter. That number is buried in a CEO keynote quote. It is the story.
Token volume through Google's own APIs grew 60 percent in one quarter. That is not marketing language. That is a concrete signal that enterprise use of Google's model layer is compounding. Every other announcement at Cloud Next '26 has to be read against that backdrop.
Google is not trying to win the model race. It is trying to be the substrate under everyone else's agents.
The TPU Split
Google's headline hardware announcement was the split of its 8th-generation TPU into two chips. TPU 8t is built for training. TPU 8i is built for inference. Two distinct architectures in the same generation.
The two sets of specs tell you where Google thinks the value is. TPU 8t scales to 9,600 chips per pod, delivers 121 exaflops of compute, and claims nearly three times the per-pod training performance of the previous generation. That is a training chip, pointed at frontier model companies.
TPU 8i is the interesting one. It has 288 GB of high-bandwidth memory, 384 MB of on-chip SRAM (triple the previous generation), 19.2 Tb/s interconnect bandwidth, and claims an 80 percent better performance-per-dollar ratio against the prior chip. On-chip latency is down up to five times. That is an inference chip, and it is aimed squarely at agentic workloads where cost-per-token and time-to-first-token decide whether a deployment ships or dies in pilot.
The chips run on Axion, Google's Arm-based CPU. Citadel Securities is named as a customer. The Virgo Network can cluster up to one million TPUs across multiple data centers. Target goodput, meaning productive compute time, sits at approximately 97 percent. Generally available later this year.
Nvidia's Rubin GPUs still offer higher per-chip performance in isolated comparisons. Google's bet is scale and integration. The two-chip architecture is the admission that training and inference have diverged as workloads to the point where one silicon design no longer optimizes for both. That is not a small admission, and the fact that it is coming from the company that pioneered the TPU matters.
Gemini Enterprise: The Agent Platform
Hardware was the floor. The platform is the argument.
Gemini Enterprise is Google's new agent platform, built on Vertex AI. It combines model selection, agent building, orchestration, and a set of governance features that Google is clearly pitching to CIOs nervous about AI agent sprawl. The feature list is specific:
- A flowchart tool for multi-agent coordination
- Agent Studio for building agents from natural language
- A Memory Bank that gives agents long-term memory across sessions
- Sandboxed environments for code and browser automation
- A central registry to prevent duplicate agents spawning in production
The security layer is where Google is trying to differentiate. Each agent gets a cryptographic identity. Upstream prompt injection filters sit in front of the model. Anomaly detection runs against unauthorized access. Simulation tools let teams test agents before production. Three new Google Security Operations agents, a Threat Hunting agent, a Detection Engineering agent, and a Third-Party Context agent, round out what Google is calling the agentic security stack.
Notably, the platform runs not just Gemini 3.1 Pro but also Anthropic's Claude family: Opus, Sonnet, Haiku, and Opus 4.7. That is the same multi-model posture we covered in Google's agent layer strategy against Anthropic. Google is happy to run Anthropic models if that is what gets the customer to stay on Google Cloud.
The bet: enterprise CIOs pay for the agent governance layer, and the model itself becomes a commodity they plug in.
The $750 Million Fund and the Thinking Machines Deal
Google paired the platform with distribution.
A new $750 million partner fund will support agentic AI adoption among consulting firms, systems integrators, software vendors, and channel partners. Specific resources include AI value assessments, Gemini proofs-of-concept, practice building, prototyping, security assessments, and deployment incentives. This is the enterprise sales motion, priced at three quarters of a billion dollars, aimed at the Accentures and Capgeminis who still mediate most Fortune 500 AI decisions.
Thinking Machines Lab, the frontier AI startup founded by former OpenAI researchers, is the flagship customer announcement. Google Cloud is giving Thinking Machines A4X Max VMs with Nvidia GB300 GPUs, plus Kubernetes Engine, Spanner, Cluster Director, Cloud Storage, and Anywhere Cache. Myle Ott, Thinking Machines' founding researcher, said the setup let the company operate "at record speed."
Two details worth pulling out. First, Thinking Machines is running Nvidia GB300s inside Google Cloud. Google will happily sell you Nvidia alongside TPUs if that keeps the workload on its cloud. Second, landing Thinking Machines is a counter to Anthropic's enterprise AI OS positioning. Google wants to be the infrastructure under everyone training frontier models, not just the lab building one of them.
Where This Competes
Three fights are running at once.
Against Nvidia, Google is pitching full-stack integration and scale. A single pod of 9,600 TPUs with 2 petabytes of shared memory is not a spec Nvidia can match without significant networking work. For training runs where bandwidth and coherence matter, Google has a story. For everything else, Nvidia still wins on raw per-chip performance and developer mindshare.
Against Anthropic and OpenAI, Google is running a different play. It is not trying to build the best frontier model. It is trying to be the cheapest, most governable place to run agents that use any model. If that thesis works, Anthropic's model wins and Google's infrastructure wins. OpenAI, which is pivoting $1.5 billion toward enterprise rewiring with its own PE-backed venture, has to decide whether to own its own infrastructure or rent it.
Against its own cloud competitors, AWS and Azure, Google is ahead on the integrated agentic stack. Bedrock and AI Foundry are strong on model access. Neither has the TPU story, the Workspace layer, or the governance wrapper that Gemini Enterprise is bundling together.
The Commerce and Payments Angle
This is where the Google announcement lands in our specific patch.
Agentic commerce runs on inference economics. Every agent that decides to buy something on behalf of a consumer or a business represents a specific cost-per-decision. If that cost falls 80 percent, which is what TPU 8i's performance-per-dollar claim implies, agent-originated transactions move from pilot to production for a whole new tranche of merchants and banks. The agent tax shrinks.
The payments layer has the same exposure. Real-time fraud models are inference workloads running continuously. Faster, cheaper inference means richer models in the authorization path. Richer models in the authorization path mean the asymmetry between attacker and defender starts to close, which we wrote about recently in the context of the fraud detection window shrinking to nothing. Google's hardware story is one of the levers that changes that asymmetry.
The Gemini Enterprise security agents are the direct play. Threat Hunting, Detection Engineering, Third-Party Context. Every bank already runs those functions with humans augmented by models. Google is packaging them as agents a CIO can buy, not build. That reshapes the buyer for fraud and risk tooling, and it puts Google in direct competition with the vendors we cover in the AI Tools Directory.
What To Watch
Four signals over the next two quarters.
First, whether Google publishes concrete pricing for TPU 8i when it ships. The 80 percent performance-per-dollar claim needs a price sheet to be useful. Until enterprise buyers can run the math, it stays a marketing number.
Second, whether the $750 million partner fund produces named deployments. A fund of that size should buy visible wins at Accenture, Deloitte, TCS, or Capgemini within six months. Silence for a year suggests the deployment friction is worse than Google admits.
Third, whether Anthropic's presence on Gemini Enterprise deepens or stays symbolic. If Claude usage on Google Cloud grows materially, the multi-model thesis holds. If Anthropic builds its own platform and customers move, Google's governance wrapper was not enough.
Fourth, whether the token growth rate Pichai disclosed holds. 60 percent quarter-over-quarter is the kind of number that either compounds or reverts. Next earnings call will tell us which.
Sources
If Google makes itself the substrate for everyone else's agents, does the model layer commoditize faster than anyone in the lab business is pricing in?
Charlie Major is a Product Development Manager at Mastercard. The views and opinions expressed in Major Matters are his own and do not represent those of Mastercard.