For two years the question every payments firm asked about AI was whether the model was good enough. That question is mostly settled. The models can read a dispute, score a transaction, summarize a merchant's risk profile, and hold a conversation with a cardholder. What firms are discovering now is that having a capable model and running one inside a live payment flow are different problems, and the second one is harder than the first.

PYMNTS put it plainly this week: the industry is finding that AI's real limit is infrastructure, not the technology in the abstract. The bottleneck is no longer how smart the model is. It is whether the rails underneath can run it for every transaction, at a cost that works, fast enough that the customer does not wait.

The hard part of AI in payments stopped being the model a while ago. It is the inference bill, the latency budget, and the data the model can actually reach at the moment of the transaction.

Inference is the cost that does not go away

Training a model is a one-time expense. Inference, the act of running the model every time it answers, is a cost you pay on every single transaction, forever. At the volumes payments runs, that bill is the thing that decides whether an AI feature ships or stays a demo.

This is the context for the custom silicon arms race. OpenAI and Broadcom unveiled Jalapeño, a chip built specifically for running large language models rather than training them. You do not design your own inference chip because the models are not smart enough. You do it because running them at scale is too slow and too expensive on general-purpose hardware, and shaving cents off each inference is the difference between a feature that scales and one that bankrupts the unit economics.

We have tracked this shift before. The latency premium that chipmakers built for human-facing AI does not match what an automated transaction actually needs, and the hardware pipeline is being rewired around that gap. The frontier labs are vertically integrating down to the silicon because inference is where the economics are decided.

The scale of the bet is its own warning

The spending required to meet this demand has stopped being abstract. Oracle is funding a multi-billion-dollar data center buildout partly through debt, and partly by cutting 21,000 jobs to redirect the money toward infrastructure. That is the cost of admission to the inference business, and it tells you the constraint is physical: power, chips, buildings, cooling.

For a payments firm, this reframes the build-versus-buy question. Owning the infrastructure means committing capital at a scale only the largest players can sustain. Renting it means accepting that your AI economics are set by whoever owns the data center, and that your latency depends on their network. We argued during the compute shortage that agentic commerce would hit a physical ceiling before a conceptual one. The infrastructure spending happening now is the industry trying to raise that ceiling, and it is not cheap.

The data layer is the part everyone underrates

A model is only as useful as the data it can reach at the moment it answers. MIT Technology Review described the emergence of a web data infrastructure layer for AI, built on the recognition that most of the information enterprises need is blocked, unstructured, or trapped in systems the model cannot query in real time.

In payments, this is the harder half of the problem and the one that gets the least attention. The model can be brilliant and the chip can be fast, but if the fraud decision needs a customer's transaction history, the merchant's risk profile, and a sanctions check, and those three sources live in systems that answer in their own time, the AI waits with everything else. The intelligence is not the constraint. The data plumbing is.

What this means for the firms deciding now

The reframe is uncomfortable because it favors a different kind of company. If the model were the constraint, the winner would be whoever had the cleverest AI. Since the constraint is infrastructure, the advantage goes to whoever controls reliable, low-latency, affordable inference and clean access to the data the model needs. Those are not the same companies, and the second list is shorter.

For most payments firms, the practical conclusion is to stop shopping for models and start auditing the rail underneath. What does an inference cost at your transaction volume. What latency can you actually promise a customer when the model is in the loop. Which data does the model need at decision time, and can your systems serve it in milliseconds rather than minutes. Those questions decide whether an AI feature ships, and none of them are about how smart the model is.

The models got smart enough. The firms that win the next phase will be the ones that built, or rented well, the infrastructure to run them.

If the model is no longer the hard part, how much of your AI roadmap is really a budget for inference and data you have not priced yet?

Charlie Major is a Product Development Manager at Mastercard. The views and opinions expressed in Major Matters are his own and do not represent those of Mastercard.