Re: Maintaining the en-US dictionary that ships with Mozilla products

Jörg Knobloch Wed, 30 Dec 2015 01:21:08 -0800

On 30/12/2015 01:46, Ehsan Akhgari wrote:

First things first, let's correct something here.  We do _not_ maintain
three word lists.  We maintain one list: the list of words that the
Firefox spellchecker accepts.

I know I sound like a broken record: I suggested to change the processand maintain three lists.

I'm afraid you're misunderstanding what's happened here.  We only
maintain one word list, and our process of merging upstream changes is
purely additive.  As a result, it doesn't handle the case where a word
disappears from SCOWL.

This is clearly a bug, and should be fixed.

I came to realise that my argument has a hole. On one had I'mcomplaining that at the beginning of May 2015 words got removed, see:

https://hg.mozilla.org/mozilla-central/diff/bcb133a3cdca/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/orig/en_US.dic
(don't open in Firefox, it will hang, bug 1235321):
-relict
-residuary
-enforceability
(all still included in the "large" dataset).

On the other hand I'm complaining that wrong entries, like "remind's"are maintained in the Mozilla data. "remind's" is not a valid word inSCOWL (http://app.aspell.net/lookup?dict=en_US&words=remind%27s), but itis in Mozilla. So there is a bug in the removal process.

Frankly, I can't understand how the current system could manage SCOWLremovals yet not remove words Mozilla specifically added. How does itknow that a word came from SCOWL and can be removed or it didn't comefrom SCOWL and should be maintained? Broken record: Maintaining Mozillawords differently (three lists) would fix this.

We may still decide to keep
individual words that SCOWL drops if we decide that we want the Firefox
spell checker to accept them, but as a general rule we should probably
follow upstream.

It is pretty much unmanageable to do this. On every refresh you wouldhave to add removed words manually. Broken record: Mozilla should notmanage the general English words (apart from some exceptions, see below).

We should leave it to SCOWL
to manage the plain English dictionary and only manage the Mozilla
additions (for which I see three classes, see above).

I disagree.  I think we should accept the words that we want, and then
try to upstream them to SCOWL, without holding Firefox back until that
happens.  I experimented with this once
<https://github.com/kevina/wordlist/issues/117> but unfortunately I
haven't had the time to go through all of the list.  (As a non-native
speaker this task requires me to spend weeks looking things up in
dictionaries!)

I sound like a broken record but you ignored my proposal: To facilitatethe process of having more words than SCOWL, I proposed to split these"more words" into three files. The third file would contain "general"words we request upstream.

Wonderful!  If you have a list of words using these types of characters
that we need to add, please file a bug, and let's do that!

No I won't do that. I filed a a bug to use the "large" dictionary, butyou even changed the summary and hijacked it for something else. Itmakes no sense to request a heap of words to be added to the Mozilladictionary, like "résumé", "née" and so on, which already exist in the"large" dataset. Broken record: Mozilla doesn't want to be in thebusiness of managing this. Mozilla should be in the business of managingMozilla specific additions, and perhaps a small amount of general wordsthat get added (third list), which will then be requested upstream.

The current system can't do this, you resist changing it, so I just giveup, since I'm not using the defective en-US spelling anyway.


Jorg K.

_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform

Re: Maintaining the en-US dictionary that ships with Mozilla products

Reply via email to