Contact emails dome...@chromium.org<mailto:dome...@chromium.org>,sushr...@microsoft.com<mailto:sushr...@microsoft.com>, m...@chromium.org<mailto:m...@chromium.org>,kenjibah...@chromium.org<mailto:kenjibah...@chromium.org>,fran...@microsoft.com
Explainer https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#tool-use Specification None yet. Summary We would like to prototype Function Calling in the Prompt API. At a high level, this allows developers to define additional capabilities that a language model can invoke as it processes a prompt either to trigger some actions or obtain additional information to complete a task. Blink component Blink > AI > Prompt Motivation Large Language Models (LLMs) are most effective when they can move beyond text generation and interact with external data and systems. This capability, commonly known as "Function Calling" or "Tool Use," allows a model to query APIs, perform actions, or access real-time information, transforming it from a passive chatbot into an active assistant. We propose adding Function Calling to the Prompt API to bring this powerful capability to the web platform for on-device models. Enabling developers to define tools that the on-device model can use will unlock a new class of rich, AI-powered web experiences. Based on strong developer signals from Chrome’s built-in AI hackathon<https://googlechromeai.devpost.com/> and early preview program<https://developer.chrome.com/docs/ai/join-epp>, there is significant demand for this feature across a range of applications: * Natural language UX: A flight search website could enhance their UX with this capability to allow users to express their needs more directly, without having to discover and learn more advanced controls (e.g. search filters). * Data-Driven Applications: A developer building a data visualization dashboard could provide tools to query and present data. This would allow users to ask questions like, "Show me sales from last quarter for the EMEA region" and have the model translate the request into a structured query and graph definitions for the respective tools. * Creative and Productivity Tools: an editor could expose building blocks and fundamental actions as tools to allow their users to easily create powerful macros / custom features without having to learn a new language. Adding this capability to the Prompt API will provide a standardized, model-agnostic way for web developers to integrate their site's logic with an on-device LLM, all while keeping user data on-device. Relationship to Script Tools This proposal is related, yet different, from the Script Tools API (intent to prototype<https://groups.google.com/a/chromium.org/g/blink-dev/c/W444JxsqxZw/m/b7E_JLXjBwAJ>). While both involve exposing JavaScript functions for AI use, they serve distinct, complementary purposes. The key difference lies in who orchestrates the interaction: * Prompt API Function Calling (This Proposal): The web page (web developer) is the orchestrator. A developer uses the Prompt API to call the on-device model to enhance their site's own features. The page defines the tools and initiates the prompt, controlling the entire interaction. This is an "inward-facing" capability for the site to use AI. * Script Tools: An external agent is the orchestrator. A web page uses the Script Tools API to expose its capabilities to an external agent (e.g. a browser or OS-level agent, or features and apps including accessibility related opportunities). The external agent, acting on a user's behalf, discovers and invokes these tools to perform tasks on the page. This is an "outward-facing" capability for the site to work well when used through an external agent. These APIs are two sides of the same coin, and we are closely collaborating to ensure a unified developer narrative. A website could use the Prompt API to enhance its own UX while also exposing Script Tools to allow an external agent (e.g. browser or OS agent, other apps/features) to automate tasks on the user’s behalf. Alternative Considered: Direct MCP Integration We considered exposing a lower-level protocol like the Model Context Protocol (MCP) directly to the web. While MCP is emerging as a standard for agents interacting with external tools and data, we believe that a higher-level, native API is better suited for the web platform, in alignment with W3C design principles and developer ergonomics. Our and Script Tools’ approach is to provide an interface that is isomorphic with the concepts in MCP but abstracts away the low-level protocol details. This design has several advantages: * Developer Ergonomics: It provides a simple, familiar JavaScript API that is easier for web developers to adopt and integrate into their existing client-side logic. * Direct Access to In-Page State: The API naturally operates within the page's execution context, giving tools trivial access to the DOM, session data, and other transient state generated by the user. A direct MCP integration would require an additional bridge to communicate this live, in-tab context to where the Tools are called. * Durability and Decoupling: It avoids tightly coupling the web platform to a specific version of an external protocol that is still rapidly evolving. * Alignment with client-side upsides: having the tools be locally defined and using JavaScript is a better option to benefit from client-side AI’s advantages, in particular resiliency to network flakiness or offline conditions, but also privacy or costs concerns. By providing a web platform specific API that is isomorphic to MCP, we deliver immediate value to developers in a way that is both powerful and ergonomic. This pragmatic approach is the right first step for the web platform and does not preclude a more direct MCP integration in the future as the protocol stabilizes. Initial public proposal None yet. TAG review None yet. TAG review status TBD. Risks Interoperability and Compatibility Addition of function calling to the prompt API, helps alleviate one aspect of interop concern between different models implementing the prompt API. Language models use special tokens to define tools and parameters, these tokens and the template used to declare tool availability are model specific. For example: Phi 4, the model used by Microsoft Edge, uses the following format to declare tools in its system prompt <|tool|>[{"name": "get_weather_updates", "description": "Fetches weather updates for a given city using the RapidAPI Weather API.", "parameters": {"city": {"description": "The name of the city for which to retrieve weather information.", "type": "str", "default": "London"}}}]<|/tool|> Gemini Nano, the model used by Chrome, has a different, not public, format (as a reference see Gemma’s documentation on the topic<https://ai.google.dev/gemma/docs/capabilities/function-calling>). Today, web developers eager to use function calling with the prompt API on user agents that implement it with Phi 4 can pass the exact string above and receive responses as below which they can parse into a javascript function call <|tool_call|>[{"name": "get_weather_updates", "arguments": {"city": “San Francisco"}}]<|/tool_call|> such an implementation is not interoperable. Our Fn. calling proposal builds an abstraction for web developers to use function calling in an interoperable way, handling the conversion of high level function description to model specific syntax. Gecko: No signal on function calling specifically (also see Prompt API position<https://github.com/mozilla/standards-positions/issues/1213>) WebKit: No signal on function calling specifically (also see Prompt API position<https://github.com/WebKit/standards-positions/issues/495>) Web developers: early signals indicating high interest (see motivation section and engagement on WICG repo<https://github.com/webmachinelearning/prompt-api/issues/7>) Is this feature fully tested by <https://chromium.googlesource.com/chromium/src/+/master/docs/testing/web_platform_tests.md> web-platform-tests<https://chromium.googlesource.com/chromium/src/+/master/docs/testing/web_platform_tests.md>? Aiming for proper test coverage. Requires code in //chrome? True Tracking bug http://crbug.com/422803232 Estimated milestones No milestones specified. Link to entry on the Chrome Platform Status https://chromestatus.com/feature/5085580563841024 -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscr...@chromium.org. To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CH5PR00MB2282BDFD8DC41C4F1A200FC2FD2CA%40CH5PR00MB2282.namprd00.prod.outlook.com.