LGTM to experiment from M139 to M144 inclusive.
On 5/16/25 3:18 AM, Domenic Denicola wrote:
Contact emails
a...@chromium.org, m...@chromium.org, btri...@chromium.org,
dome...@chromium.org, kenjibah...@chromium.org
Explainer
https://github.com/webmachinelearning/prompt-api/blob/main/README.md
<https://github.com/webmachinelearning/prompt-api/blob/main/README.md>
Specification
None yet, although some of the shared infrastructure in
https://webmachinelearning.github.io/writing-assistance-apis/#supporting
<https://webmachinelearning.github.io/writing-assistance-apis/#supporting>will
be used.
Summary
An API designed for interacting with an AI language model using text,
image, and audio inputs. It supports various use cases, from
generating image captions and performing visual searches to
transcribing audio, classifying sound events, generating text
following specific instructions, and extracting information or
insights from text. It supports structured outputs which ensure that
responses adhere to a predefined format, typically expressed as a JSON
schema, to enhance response conformance and facilitate seamless
integration with downstream applications that require standardized
output formats.
This API is also exposed in Chrome Extensions, currently as an Origin
Trial. This Intent is for exposure as an Origin Trial on the web.
Blink component
Blink>AI>Prompt
<https://issues.chromium.org/issues?q=customfield1222907:%22Blink%3EAI%3EPrompt%22>
TAG review
https://github.com/w3ctag/design-reviews/issues/1093
<https://github.com/w3ctag/design-reviews/issues/1093>
TAG review status
Pending
Risks
Interoperability and Compatibility
This feature, like all built-in AI features, has inherent
interoperability risks due to the use of AI models whose behavior is
not fully specified. See some general discussion in
https://www.w3.org/reports/ai-web-impact/#interop.
In particular, because the output in response to a given prompt varies
by language model, it is possible for developers to write brittle code
that relies on specific output formats or quality, and does not work
across multiple browsers or multiple versions of the same browser.
There are some reasons to be optimistic that web developers won't
write such brittle code. Language models are inherently
nondeterministic, so creating dependencies on their exact output is
difficult. And many users will not have the hardware necessary to run
a language model, so developers will need to code in a way such that
the prompt API is always used as an enhancement, or has appropriate
fallback to cloud services.
Several parts of the API design help steer developers in the right
direction, as well. The API has clear availability testing features
for developers to use, and requires developers to state their required
capabilities (e.g., modalities and languages) up front. Most
importantly, the structured outputs feature can help mitigate against
writing brittle code that relies on specific output formats.
Gecko: No signal
(https://github.com/mozilla/standards-positions/issues/1213
<https://github.com/mozilla/standards-positions/issues/1213>)
WebKit: No signal
(https://github.com/WebKit/standards-positions/issues/495
<https://github.com/WebKit/standards-positions/issues/495>)
Web developers: Strongly positive
(https://github.com/webmachinelearning/prompt-api/blob/main/README.md#stakeholder-feedback
<https://github.com/webmachinelearning/prompt-api/blob/main/README.md#stakeholder-feedback>)
Other signals: We are also working with Microsoft Edge developers on
this feature, with them contributing the structured output functionality.
Activation
This feature would definitely benefit from having polyfills, backed by
any of: cloud services, lazily-loaded client-side models using WebGPU,
or the web developer's own server. We anticipate seeing an ecosystem
of such polyfills grow as more developers experiment with this API.
WebView application risks
Does this intent deprecate or change behavior of existing APIs, such
that it has potentially high risk for Android WebView-based applications?
None
Goals for experimentation
Validate the technical implementation and developer experience of
multimodal inputs with a broader audience and actual usage.
Assess how structured output improves ergonomics and could address
interoperability concerns between implementations (e.g. different
underlying models).
Gather extensive feedback from a wide range of web developers rooted
in real world usage.
Identify diverse and innovative use cases to inform a roadmap of task
APIs.
Ongoing technical constraints
None
Debuggability
It is possible that giving DevTools more insight into the
nondeterministic states of the model, e.g. random seeds, could help
with debugging. See discussion
athttps://github.com/webmachinelearning/prompt-api/issues/74
<https://github.com/webmachinelearning/prompt-api/issues/74>.
We also have some internal debugging pages which give more detail on
the model's status, e.g. chrome://on-device-internals, and parts of
these might be suitable to port into DevTools.
Will this feature be supported on all six Blink platforms
(Windows, Mac, Linux, ChromeOS, Android, and Android WebView)?
No
Not all platforms will come with a language model. In particular, in
the initial stages we are focusing on Windows, Mac, and Linux.
Is this feature fully tested by web-platform-tests
<https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md>?
No
We plan to write web platform tests for the API surface as much as
possible. The core responses from the model will be difficult to test,
but some facets are testable, e.g. the adherence to structured output
response constraints.
Flag name on about://flags
prompt-api-for-gemini-nano-multimodal-input
Finch feature name
AIPromptAPIMultimodalInput
Requires code in //chrome?
True
Tracking bug
https://issues.chromium.org/issues/417530643
<https://issues.chromium.org/issues/417530643>
Measurement
We have various use counters for the API, e.g. LanguageModel_Create
Non-OSS dependencies
Does the feature depend on any code or APIs outside the Chromium open
source repository and its open-source dependencies to function?
Yes: this feature depends on a language model, which is bridged to the
open-source parts of the implementation via the interfaces in
//services/on_device_model.
Estimated milestones
Origin trial desktop first
139
Origin trial desktop last
144
DevTrial on desktop
137
DevTrial on Android
137
Anticipated spec changes
Open questions about a feature may be a source of future web compat or
interop issues. Please list open issues (e.g. links to known github
issues in the project for the feature specification) whose resolution
may introduce web compat/interop risk (e.g., changing to naming or
structure of the API in a non-backward-compatible way).
https://github.com/webmachinelearning/prompt-api/issues/42
<https://github.com/webmachinelearning/prompt-api/issues/42>is
somewhat worth keeping an eye on, but we believe a forward-compatible
approach is possible by just providing constant min = max values.
Link to entry on the Chrome Platform Status
https://chromestatus.com/feature/5134603979063296?gate=5106702730657792
<https://chromestatus.com/feature/5134603979063296?gate=5106702730657792>
Links to previous Intent discussions
Intent to Prototype:
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra_LXU8KkcVJ0x%3DzYa4h_sC3FaHGdaoM59FNwwtRAsOALQ%40mail.gmail.com
<https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra_LXU8KkcVJ0x%3DzYa4h_sC3FaHGdaoM59FNwwtRAsOALQ%40mail.gmail.com>
This intent message was generated by Chrome Platform Status
<https://chromestatus.com/>.
--
You received this message because you are subscribed to the Google
Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to blink-dev+unsubscr...@chromium.org.
To view this discussion visit
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra9oT0jygAYT00WPp0_wtZ-znrB2OdZ6GQb%2B3thFLP19pA%40mail.gmail.com
<https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra9oT0jygAYT00WPp0_wtZ-znrB2OdZ6GQb%2B3thFLP19pA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to blink-dev+unsubscr...@chromium.org.
To view this discussion visit
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/d1c1090c-9c5b-400c-9b33-c30ca804dc3f%40chromium.org.