Re: [blink-dev] Intent to Prototype: Document Local Dictionary API

Ziran Sun Fri, 26 Sep 2025 08:00:32 -0700

Hi Rick, Daniel,

I'm looking at the case of a non-persistent and document-local dictionary 
that stores the word list in memory. Is it Okay to illustrate a bit more on 
why Blink>DOM might not be the right component for this? And what are the 
issues you could foresee on ensuring the data is reliably per-document?


Thank you!

Ziran

On Tuesday, 22 July 2025 at 22:02:41 UTC+1 Rick Byers wrote:

> On Tue, Jul 22, 2025 at 3:18 PM Stephen Chenney <sche...@chromium.org> 
> wrote:
>
>> Regarding motivation, our client has financial data, such as 
>> stock symbols and company names. There are similar use cases for medical 
>> data, fan fiction, or anything else with words that might not appear in 
>> hunspell's dictionaries. It's conceivable that the Google internal spelling 
>> APIs have these words but clients may be very reluctant to send their 
>> strings to Google.
>>
>> The proposal in this intent is relatively straightforward to implement 
>> and privacy and security is relatively simple to assess. But for developers 
>> there will probably be significant load time costs around it, to fetch the 
>> site's dictionary and process it to add the words.
>>
>
> I'd love to see some figures on this. Maybe a bulk add API would be 
> enough? As a quick example I picked a random website (bloomberg.com) and 
> found it downloaded 3.4MB compressed including a number of individual 
> scripts, images and JSON blobs which were around 100kB compressed each. In 
> contrast the entire american-english dictionary on my linux machine 
> compresses down to 270kB. So as long as we're talking about something 
> that's less than 10% the size of the whole american english dictionary, my 
> hunch is that the transfer cost will be insignificant and lost in the 
> noise. But still an http approach to at least enable caching would be a 
> good idea with little downside. I could imagine, for example, a <link 
> rel=dictionary> tag or something that would be even simpler than this JS 
> API approach? 
>
> Anyway this is just random thoughts to try to nudge away from premature 
> optimization, not API owner input or anything :-).
>
> We have some ideas around that in future work but nothing concrete. I 
>> think we'll have to address it before we ship.
>>
>> A HTTP header approach would make the ergonomics easier (assuming the 
>> infrastructure for setting up a spelling server is reasonably standard) and 
>> fits better into the existing code, But ti would not work offline. Maybe 
>> the approaches are complementary and we do both.
>>
>> I'll try to get some idea on the size of typical dictionaries in this 
>> space. It is important to know,
>>
>> Cheers,
>> Stephen.
>>
>> On Tue, Jul 22, 2025 at 12:03 PM Rick Byers <rby...@chromium.org> wrote:
>>
>>> Spelling server seems a lot harder to get right to me, obviously more to 
>>> worry about regarding privacy etc. Can you share anything more about the 
>>> motivating use cases here? Like how large do these custom dictionaries tend 
>>> to be? I'd guess that for even dictionaries up to 1MB compressed it's 
>>> probably faster and simpler to just have the client download the whole 
>>> thing. RTT latency is generally a bigger performance problem these days 
>>> than raw throughput. But if it's important to solve scenarios with really 
>>> large dictionaries then maybe it's worth exploring?
>>>
>>> On Tue, Jul 22, 2025 at 11:11 AM Stephen Chenney <sche...@chromium.org> 
>>> wrote:
>>>
>>>> Thanks for the early feedback, and sorry for the lack of clarity on the 
>>>> explainer. We're working on improving the explainer to address the issues 
>>>> raised here and issues raised on github.
>>>>
>>>> We're also considering an entirely different approach whereby a site 
>>>> provides a "spelling server" URL in the HTML header. That would operate 
>>>> more like the existing "send it to Google" spell checking options. We're 
>>>> super early in designing such a thing, but if anyone has early feedback on 
>>>> that approach we would be interested.
>>>>
>>>> Cheers,
>>>> Stephen.
>>>>
>>>> On Tue, Jul 22, 2025 at 10:54 AM Rick Byers <rby...@chromium.org> 
>>>> wrote:
>>>>
>>>>> FWIW I was also a little confused reading the explainer, but I think I 
>>>>> understand the overall design and I think it's a good one: these 
>>>>> dictionaries are transient and document-local, simply a mechanism to let 
>>>>> pages selectively suppress spell check violations on their own page.
>>>>>
>>>>> Presumably discussion of network fetches in the explainer are just 
>>>>> about the app fetching from it's server (not fetches in the browser), and 
>>>>> all the discussions of "persistent" storage are under the "future work" 
>>>>> section so it's fine to me that there's no detail here (it's out of scope 
>>>>> because it's hard). I'm not sure whether it would make sense to extend 
>>>>> this 
>>>>> design into persistent storage or not, but I'm also not sure it matters 
>>>>> (as 
>>>>> the explainer says it's simply an optimization - a problem that may or 
>>>>> may 
>>>>> not exist in practice so not worth worrying about today). 
>>>>>
>>>>> Ensuring the data is reliably per-document is definitely a key 
>>>>> implementation concern, so I agree with you there Daniel. And yes we'll 
>>>>> eventually want signals from other browser vendors, but our process 
>>>>> <https://www.chromium.org/blink/launching-features/> has that step 
>>>>> only after prototyping is complete (often we learn a lot about the design 
>>>>> from prototyping), so it's premature to ask for it now at I2P phase. 
>>>>>
>>>>> Cheers,
>>>>>   Rick 
>>>>>
>>>>> On Tue, Jul 22, 2025 at 7:37 AM 'Daniel Vogelheim' via blink-dev <
>>>>> blin...@chromium.org> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> This intent came up in security review, and I'm mostly confused:
>>>>>>
>>>>>> - The explainer mostly seems to assume that these are stored 
>>>>>> in-memory, per-document. But it also talks about absence of 
>>>>>> cross-origin-requests; only to add info about CORS, which only makes 
>>>>>> sense 
>>>>>> for cross-origin requests.
>>>>>> - There are multiple references to loading data, but there is no 
>>>>>> explanation about what kind of network requests are being made when or 
>>>>>> where.
>>>>>> - The explainer suggests "Persistently store data" as an optimization 
>>>>>> for having to re-load large dictionaries. Again, no information about 
>>>>>> which 
>>>>>> requests are being optimized away.
>>>>>> - In "Data Storage" it is pointed out that CustomDictionaryEngine 
>>>>>> exists per renderer process. While renderer processes mostly don't have 
>>>>>> cross-origin data, they sometimes do. And they may hold multiple 
>>>>>> documents. 
>>>>>> This seems inconsistent with information being stored per-document.
>>>>>>
>>>>>> Non-security feedback:
>>>>>> - Since this is a web-exposed API, I'd have expected some attempt at 
>>>>>> checking with other browser engines on support.
>>>>>> - I do not understand the "High-level Architecture". It seems to 
>>>>>> feature a stack of methods that feeds into yes/no decisions which feeds 
>>>>>> into a storage thing. I have no idea what this is meant to convey.
>>>>>> - Blink>DOM might not be the right component for this.
>>>>>>
>>>>>>
>>>>>> Could you please update the documentation to be more clear about 
>>>>>> where data is stored, and about which network requests are being made?
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2025 at 12:08 PM Chromestatus <
>>>>>> ad...@cr-status.appspotmail.com> wrote:
>>>>>>
>>>>>>> Contact emails ji...@igalia.com 
>>>>>>>
>>>>>>> Explainer 
>>>>>>> https://github.com/Igalia/explainers/tree/main/dictionary-api 
>>>>>>>
>>>>>>> Specification None 
>>>>>>>
>>>>>>> Design docs 
>>>>>>>
>>>>>>> https://github.com/Igalia/explainers/tree/main/dictionary-api#-proposal 
>>>>>>>
>>>>>>> Summary 
>>>>>>>
>>>>>>> The proposed APIs enable users to modify the document local 
>>>>>>> dictionary in the browser. Users can add, remove, and check words in 
>>>>>>> the 
>>>>>>> document local dictionary. This feature ensures the browser does not 
>>>>>>> mark 
>>>>>>> words in the document local dictionary as spelling errors.
>>>>>>>
>>>>>>>
>>>>>>> Blink component Blink>DOM 
>>>>>>> <https://issues.chromium.org/issues?q=customfield1222907:%22Blink%3EDOM%22>
>>>>>>>  
>>>>>>>
>>>>>>> Motivation 
>>>>>>>
>>>>>>> Some words need to be added to the document custom dictionary so 
>>>>>>> that the browser does not mark them as spelling errors. The added words 
>>>>>>> need to be removed at some point if they aren't necessary. Current 
>>>>>>> specs 
>>>>>>> such as element.spellcheck attribute and ::spelling-error CSS 
>>>>>>> pseudo-element manage the words already in the dictionary. Therefore, 
>>>>>>> the 
>>>>>>> new API would be needed to manipulate the document local dictionary.
>>>>>>>
>>>>>>>
>>>>>>> Initial public proposal None 
>>>>>>>
>>>>>>> TAG review None 
>>>>>>>
>>>>>>> TAG review status Pending 
>>>>>>>
>>>>>>> Risks 
>>>>>>>
>>>>>>>
>>>>>>> Interoperability and Compatibility 
>>>>>>>
>>>>>>> None
>>>>>>>
>>>>>>>
>>>>>>> *Gecko*: No signal 
>>>>>>>
>>>>>>> *WebKit*: No signal 
>>>>>>>
>>>>>>> *Web developers*: No signals 
>>>>>>>
>>>>>>> *Other signals*: 
>>>>>>>
>>>>>>> WebView application risks 
>>>>>>>
>>>>>>> Does this intent deprecate or change behavior of existing APIs, such 
>>>>>>> that it has potentially high risk for Android WebView-based 
>>>>>>> applications?
>>>>>>>
>>>>>>> None
>>>>>>>
>>>>>>>
>>>>>>> Debuggability 
>>>>>>>
>>>>>>> None
>>>>>>>
>>>>>>>
>>>>>>> Is this feature fully tested by web-platform-tests 
>>>>>>> <https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md>
>>>>>>> ? Yes 
>>>>>>>
>>>>>>> third_party/blink/web_tests/wpt_internal/dom/local-dictionary/* 
>>>>>>> There is WIP patch which includes the tests
>>>>>>>
>>>>>>>
>>>>>>> Flag name on about://flags None 
>>>>>>>
>>>>>>> Finch feature name None 
>>>>>>>
>>>>>>> Non-finch justification None 
>>>>>>>
>>>>>>> Requires code in //chrome? False 
>>>>>>>
>>>>>>> Tracking bug https://issues.chromium.org/issues/428005649 
>>>>>>>
>>>>>>> Estimated milestones 
>>>>>>>
>>>>>>> No milestones specified
>>>>>>>
>>>>>>>
>>>>>>> Link to entry on the Chrome Platform Status 
>>>>>>> https://chromestatus.com/feature/6185007701557248?gate=4503614776934400 
>>>>>>>
>>>>>>> This intent message was generated by Chrome Platform Status 
>>>>>>> <https://chromestatus.com>. 
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "blink-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to blink-dev+...@chromium.org.
>>>>>>> To view this discussion visit 
>>>>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/687a1d04.170a0220.2dad83.0168.GAE%40google.com
>>>>>>>  
>>>>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/687a1d04.170a0220.2dad83.0168.GAE%40google.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "blink-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to blink-dev+...@chromium.org.
>>>>>> To view this discussion visit 
>>>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CALG6KPPzd95-XN%2BjWHLmvwjLg3wv6WjZWYvP52T6Rp%3DjEg_EVw%40mail.gmail.com
>>>>>>  
>>>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CALG6KPPzd95-XN%2BjWHLmvwjLg3wv6WjZWYvP52T6Rp%3DjEg_EVw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to blink-dev+unsubscr...@chromium.org.
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/f518ca75-46a7-4d6d-86c2-ee36c94c4c7fn%40chromium.org.

Re: [blink-dev] Intent to Prototype: Document Local Dictionary API

Reply via email to