All posts

WebMCP: The Agentic Web Gets a Front Door

WebMCP: The Agentic Web Gets a Front Door

For the past year, AI agents have been interacting with the web the way a blindfolded person navigates a room, by bumping into things. They take screenshots, parse raw DOM trees, and synthesize click events, hoping the button they need hasn't moved five pixels to the left. It's slow, brittle, expensive, and fundamentally at odds with how the web was designed to work.

That's now changing. Google and Microsoft have jointly proposed WebMCP, a proposed web standard currently being incubated by the W3C Web Machine Learning Community Group, that allows websites to expose structured, callable tools directly to AI agents through the browser. Instead of an agent fumbling through your UI, your website can now hand it a menu and say: "Here's what I can do. Here's how to ask me to do it."

This is one of the most consequential shifts in how the web will operate in the age of AI agents, and it deserves the attention of everyone building for the web today.

From screen-scraping to structured tool calls

To appreciate why WebMCP matters, you need to understand how agents currently interact with websites. There are broadly two approaches: visual and semantic. In the visual approach, an agent evaluates screenshots, reads text through vision models, and infers where to move a cursor on screen. In the semantic approach, it parses the underlying DOM and accessibility tree to extract data and trigger events on elements.

Both require extra roundtrips, multimodal inference, and significant token consumption while agents fumble around trying to understand interfaces that were designed for human eyes and hands. When they work, they're slow. When the UI changes, they break. The cost in compute and reliability is staggering.

WebMCP replaces this guesswork with structure. Through a new browser API, navigator.modelContext, a website can publish a structured list of tools in the form of JavaScript functions with natural language descriptions and JSON schemas that define their inputs and outputs. An AI agent landing on a page equipped with WebMCP tools doesn't need to figure out what buttons do. It knows, because the page told it.

Think of it as the shift from screen-scraping to well-defined APIs that describe what the page can do.

Two APIs: declarative and imperative

WebMCP proposes two complementary approaches for making a website agent-ready.

The Declarative API is the simpler path. Web developers can expose functionality by adding new attributes to standard HTML forms. By annotating <form> tags with toolname and tooldescription attributes, Chrome automatically reads these and generates a schema for the agent. When an AI fills the form, Chrome introduces a mechanism to distinguish agent-invoked form submissions from human ones, allowing the backend to handle each appropriately. For most straightforward interactions, such as booking a flight, filing a ticket, and submitting a search, this is all you need.

The Imperative API handles the more dynamic, complex cases that require JavaScript execution. Through navigator.modelContext, developers can register tools programmatically with full control over input schemas, callbacks, and async behavior:

navigator.modelContext.registerTool({
    name: "getDresses",
    description: "Search for dresses matching specific criteria",
    inputSchema: {
        type: "object",
        properties: {
            style: { type: "string" },
            color: { type: "string" },
            occasion: { type: "string" }
        }
    },
    callback: async (params, client) => {
        const results = await searchInventory(params);
        return { content: [{ type: "text", text: JSON.stringify(results) }] };
    }
});

The beauty of this approach is code reuse. The same JavaScript functions that power your existing UI can now be surfaced directly to agents. There's no need to build and maintain a separate backend MCP server, learn a new framework, or re-architect your product. For teams where the business logic is already heavily client-side, the path to agent-readiness is remarkably short.

WebMCP is not MCP

The naming might suggest otherwise, but WebMCP is architecturally distinct from Anthropic's Model Context Protocol. MCP uses JSON-RPC for backend client-server communication where AI platforms connect to hosted MCP servers to access tools and resources. WebMCP operates entirely client-side within the browser, using postMessage for communication. The web page itself is the "server" that starts when an agent lands on it.

The two protocols are complementary, not competing. Consider a travel company: it might maintain a backend MCP server for direct API integrations with platforms like Claude or ChatGPT, while simultaneously implementing WebMCP tools on its consumer-facing website so that browser-based agents can interact with its booking flow in the context of a user's active session.

This distinction matters for architects evaluating where to invest. If you're building services that AI platforms consume directly, MCP is still your path. If you want browser-based agents to interact with your existing web UI, WebMCP is the answer. For most enterprises, the answer will be both.

Human-in-the-loop by design

A critical architectural decision separates WebMCP from the fully autonomous agent paradigm that has dominated recent headlines. The standard is explicitly designed around cooperative, human-in-the-loop workflows, not unsupervised automation. Headless browsing and fully autonomous agents are specifically identified as non-goals.

The specification identifies three foundational pillars: Context — all the data agents need to understand what the user is doing; Tools — structured actions the site has chosen to expose; and Actions — discrete function calls the agent can make, scoped to what the website explicitly permits.

This is a meaningful narrowing of scope compared to screen-based approaches. Instead of an agent that can see and do anything a user can, plus read the DOM, you get an agent that can only call specific, pre-defined functions. It's not airtight, and prompt injection risks still exists wherever you have dynamic third-party tools, but it's a fundamentally more secure posture.

The browser acts as a secure proxy, enforcing same-origin policy alongside Feature Policy and Permissions Policy controls, and requiring user confirmation before agents can execute sensitive tools. navigator.modelContext.clearContext() allows the site to remove the context exposed to the agent, helping protect sensitive data during and after agentic sessions. For enterprises concerned about security and compliance in agentic workflows, this permission-first model provides a good foundation.

Why this matters

Over many decades, the web has evolved primarily for human eyes and hands. AI agents have been forced to adapt to that reality through increasingly sophisticated but fundamentally fragile hacks. WebMCP allows the web to speak the language of agents natively, without abandoning its human-first interface.

Several aspects make this particularly significant:

  1. The joint backing by Google and Microsoft through the W3C Web Machine Learning Community Group signals serious institutional commitment. This isn't a single-vendor experiment, it's a collaborative effort between the two companies that control the dominant browser engines, with a clear path toward standardization.

  2. The barrier to entry is remarkably low. Frontend developers can start exposing tools using skills they already have, with code they've already written. This is not a backend infrastructure project; it's a frontend enhancement.

  3. Early evaluations suggest that moving from vision-based agent interaction to structured WebMCP tool calls can significantly reduce computational overhead and improve task accuracy.

WebMCP is currently available as an early-preview DevTrial in Chrome 146 Canary behind feature flags, with behavior subject to change as the specification matures. Google has also opened an Early Preview Program for developers to prototype and test.

Looking ahead

We're entering a period where making your web presence "agent-ready" will become as fundamental as making it mobile-responsive was a decade ago. WebMCP provides the cleanest path to get there, leveraging existing frontend code, maintaining the human web interface as primary, and adding a structured layer that agents can consume reliably.

The web doesn't need to be torn down and rebuilt for the agentic era. It just needs a front door that agents can use. WebMCP is looking to be that front door.

The specification is still being incubated within the W3C Web Machine Learning Community Group and will evolve, but the architectural direction is clear. If you're building web applications today, now is the time to start thinking about what tools your site should expose, and how your existing JavaScript can serve a new class of consumer.

Anuradha Weeraman
Anuradha Weeraman

CTO of Verdentra, software architect, and contributor to Linux & Debian. Building intelligent products and systems that scale.