On 2015-12-28 2:31 PM, Jörg Knobloch wrote:
Recently I was browsing some bugs in "Core::Spelling checker" and much
to my surprise found four bugs where people complained about wrong or
missing words in the en-US dictionary. There were two bugs where people
complained about words in the German and the French dictionaries.

The German and French bugs were finally closed as "wontfix" and
"invalid" and referred back to the respective dictionary maintainers.
For French there is a very good a approach: The French dictionaries are
maintained via this site: http://www.dicollecte.org/ and imported for
distribution with the French version of Firefox. The situation for
German is not as good, but there is a maintainer whose work is then
turned into an add-on (in fact, sadly, two competing ones).

As you have discovered, we don't ship any non-en-US dictionaries with Firefox, so the above is off topic for this mailing list.

I was extremely surprised that Mozilla maintains a version of the en-US
dictionary, and you can see the movements here:
https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic


Basically Ekanan Ketunuti does merges from upstream providers (SCOWL)
but also adds words individually and Ehsan reviews each change.

That's incorrect. I periodically merge from SCOWL, and Ekanan regularly submits patches for words missing from SCOWL (and our en-US dictionary.)

I think this situation is less than ideal. Firstly, I don't think we
should spend time on individual additions

I disagree! The quality of the dictionary we ship with Firefox matters, and the process for adding new words seems to be working well.

> and secondly, this process
creates quite some unwanted variations (to avoid using the word "mess").
For example the en-US dictionary add-on available at AMO contains many
accented words loaned from other languages, like "Bogotá" or "cliché"
(both with Wikipedia entries), which the Mozilla dictionary is missing.

That is not a problem with the word list, it's an issue with the en-US dictionary being encoded in ISO8859-1.

Also, subtle differences are created, for example, the add-on dictionary
has "(in/un)feasible" and "(un)feasibly", whereas the Mozilla version
only had "(un)feasible" and "feasibly" (no prefix). A bug is necessary
to correct this.

Not sure what you mean. Of course, a word list can have bugs. Once you find these issues, you can report bugs, and/or submit patches. (The same goes for SCOWL, FWIW.)

> Thirdly, the add-on dictionary contains 13% more words
than the Mozilla maintained dictionary, and I think in dictionaries,
bigger is better.

I'm not sure what the "add-on dictionary" is. But FWIW you're wrong in assuming that bigger is better, both for the reason that Aryeh described and also because the format of hunspell dictionaries is not a simple list of words, so comparing two dictionaries sizes gives you no information about which one contains more words.

> For example, the Mozilla dictionary only knows
"zucchini", whereas the add-on dictionary also knows "Zulu" and other
words starting with "zu". I'd hate to think that we'd need to create
7265 bugs to add all the missing words.

Filing a single bug for all of those words and attaching them works just fine.

Is there a better way to do this? I think this is tedious business and
Mozilla should get out of it.

As the de facto maintainer of our en-US dictionary, I'm not sure where you're getting this information from, but your conclusions are unjustified. The current process seems to be working well, and I think the summary of your objections is essentially that you have found some missing words, which is a great thing to file a bug about (please CC Ekanan.)

I see no reason for Mozilla to stop maintaining and shipping the en-US dictionary.

Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to