On 29/12/2015 18:23, Ehsan Akhgari wrote:
I see no reason for Mozilla to stop maintaining and shipping the en-US dictionary.
Agreed. But we should take a different approach. I disagree that the current process is working well since it carries forward legacy errors.

I must admit that my original post was somewhat unfortunate since I wasn't fully aware of the Mozilla process. It would be great if Mozilla could just obtain a suitable dictionary from a third party and ship it. Sadly that's not the case.

The practise is that Mozilla uses the SCOWL/Aspell word list and adds Mozilla "special" words to it. Details can be found in bug 1235506.

My first point is: We're currently using SCOWL's "small" dictionary from which recently a bunch of words disappeared. So we get bugs asking for words to be added, words that were previously included and are also included in the "large" dictionary that is available.

The second point is that we're not managing Mozilla specific additions well. There are about 12000 (questionable) proper names that Mozilla adds and about 1000 extra terms which are partly grossly wrong. Here just a random excerpt:
derail's
derange's
deride's
desalt's
descale's
describe's
deserve's
deskill's
despoil's
detest's
dethrone's
detract's
devalue's
devote's
All these are wrong! You can write: "This remind's me of you" without that being flagged as a mistake! Most likely there were imported once upon a time, corrected at the source, but never removed from Mozilla's version. All extra content in https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-added should be reviewed and classified. Again, details in bug 1235506.

I am proposing to change the way the Mozilla dictionary is maintained, to keep manual intervention to a minimum and the quality to a maximum. I'm glad that Ehsan agrees that the quality is important. Sadly, we're currently not delivering a quality dictionary.

Just one more remark: The "large" dictionary I'm proposing to use is ISO8859-1 encoded (like the "small" one) and contains many words with accents, including all the ones mentioned in the original post. So there is no problem.

Jorg K.






_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to