Microsoft's Open Browser Agents Beat OpenAI's Closed One

For most of the last 18 months, the assumption behind agentic commerce has been that a small number of US labs would intermediate it. OpenAI would run the agent. Google would run the agent. Anthropic would run the agent. Every merchant, every payments network, every BNPL provider would have to negotiate with those agents from a position of asymmetric weakness, because the agents would be closed, hosted, and expensive to replicate.

That assumption took a knock this week.

When a 27 billion parameter open-weight model from Microsoft Research beats the closed flagship browser agents on the standard benchmark, the chokepoint thesis stops working.

What Microsoft shipped

Microsoft Research released Fara1.5, a family of browser computer-use agents at three sizes: 4B, 9B, and 27B parameters. The weights are open. The benchmark numbers are the headline.

Fara1.5-27B scores 72 percent on Online-Mind2Web, the standard test for browser agents navigating live websites. That number beats OpenAI's Operator. It beats Google's Gemini 2.5 Computer Use. It beats Yutori's Navigator n1.

The 4B model is the more interesting one for what comes next. A 4B parameter model can run on a single high-end laptop. It does not need a hosted API. It does not need a per-call charge. It does not need to log every shopping query into a third-party datastore.

The 9B sits in the middle, comfortable on commodity GPU hardware that any mid-sized merchant or payments processor already runs.

Why the size matters more than the score

If Microsoft had only released the 27B, this would be a closed-versus-open headline and nothing more. They released the 4B at the same time, and that changes the implication.

A 4B browser agent that runs locally is a different kind of infrastructure than a closed agent that runs in OpenAI's cloud. The local agent does not phone home. The local agent does not require a commercial agreement with the model provider to operate at scale. The local agent does not surface every transaction to a third party who can decide later to monetize that visibility.

That is the part that should worry Operator and gladden the merchants and processors. Until this week, the working assumption was that agentic commerce visibility would concentrate inside three or four hosted agents, and that the rest of the industry would be data-poor by comparison. With open weights at sub-10B scale, the visibility distributes.

This is the same pattern we saw with Llama and then DeepSeek in the general-purpose LLM space over the last 18 months. The closed labs led the benchmarks for a while. Then open weights caught up. Then open weights became the production default for any company that did not want to be dependent on a closed provider.

Browser agents are now on that curve.

What this means for agentic commerce

Two things shift.

The first is the question of who runs the buyer-side agent. Until now, the realistic answer was OpenAI or Google. With Fara1.5, the answer can be the merchant, the payments network, the bank, or the consumer's own device. A merchant who runs its own buyer-side agent can shape how products are described to it, what trust signals are checked, what the dispute path looks like. The same goes for a card network running a fraud-aware agent on behalf of issuers.

The second is the question of who runs the seller-side agent. Google is already testing llms.txt as an agentic browsing audit, which is a quiet way of measuring whether merchant sites are agent-ready. With open browser agents at this quality, both sides of a transaction can field models. The asymmetry that closed agents create starts to collapse.

We have been saying for the last six months that the state of the agentic commerce stack was the load-bearing layer to watch, alongside Google's A2UI generative UI standard for agents. Open-weight agents do not break those theses. They sharpen them. Protocols matter more, not less, when both sides can run their own agents and have to interoperate.

The control problem the open release solves

There is a specific control problem that closed browser agents created and open ones make easier.

If your merchant business depends on Operator's shopping decisions, you have no path to audit how those decisions are being made. You see what Operator shows the user. You do not see why. You do not get to inspect the reasoning trace. You do not get to challenge a misranking without going through a commercial relationship with OpenAI.

With an open model in the 4B to 27B range, an enterprise can host its own version, fine-tune it on its own data, audit its own traces, and run it through its own compliance review. That is what a payments network or a regulated financial services firm actually needs to integrate browser agents into customer journeys without exposing the firm to control risk it cannot quantify.

The closed-agent path was always going to hit this wall. It just hit it sooner than we expected.

What is still not solved

A better browser agent does not solve identity. It does not solve payments authorization. It does not solve dispute handling. It does not solve the question of who is liable when an agent buys the wrong thing or buys the right thing on the wrong card.

These are the same gaps we mapped in the missing commitment governance layer for agentic commerce and in Finix plugging three frontier models into its processor with the liability layer still empty. Fara1.5 makes the agent more capable. It does not make the agent safer to transact through.

In some ways open weights make the safety question harder. With a closed agent, the model provider takes some share of the safety problem. With an open agent running locally, the safety problem lands fully on whoever is fielding the agent. Merchants and processors taking advantage of the new flexibility will also inherit a new layer of risk to manage.

That is a trade we expect most of them to take. The alternative is letting OpenAI and Google decide what their buyer's intent is.

What to watch next

Three things to track over the next quarter.

First, which enterprise vendors integrate Fara1.5 weights into their products. Stripe, Adyen, Mastercard's account-to-account stack, and the various agentic commerce protocol consortia all have an obvious interest. Open weights at this quality remove the build-versus-buy question on the agent itself.

Second, how OpenAI and Google respond. They will not concede the agent layer. Expect price cuts on Operator and Gemini Computer Use, expect new closed model releases at smaller sizes, and expect tighter integration with payments rails to make the closed agents stickier than the open ones.

Third, whether the open-weight agent advantage holds at the next benchmark cycle. Fara1.5 leads today. Microsoft has been disciplined about iteration cadence with the Fara line. The 1.5 release is unlikely to be the ceiling.

If the open agents stay ahead, the assumption that a handful of US labs would broker agentic commerce stops being a working assumption at all.

Sources

If the buyer-side agent runs on the merchant's hardware and the seller-side agent runs on the consumer's device, who is the platform?

Charlie Major is a Product Development Manager at Mastercard. The views and opinions expressed in Major Matters are his own and do not represent those of Mastercard.

What Microsoft shipped

Why the size matters more than the score

What this means for agentic commerce

The control problem the open release solves

What is still not solved

What to watch next

Sources

Keep reading

Apple Makes Its Case: On-Device Is Critical AI Infrastructure

Claude Wants to See Your ID

SpaceX Bought an AI Coder. It Just Finished an Agentic Commerce Stack.