On 29/12/2015 18:23, Ehsan Akhgari wrote:
I see no reason for Mozilla to stop maintaining and shipping the en-US
dictionary.
Agreed. But we should take a different approach. I disagree that the
current process is working well since it carries forward legacy errors.
I must admit that my original post was somewhat unfortunate since I
wasn't fully aware of the Mozilla process. It would be great if Mozilla
could just obtain a suitable dictionary from a third party and ship it.
Sadly that's not the case.
The practise is that Mozilla uses the SCOWL/Aspell word list and adds
Mozilla "special" words to it. Details can be found in bug 1235506.
My first point is: We're currently using SCOWL's "small" dictionary from
which recently a bunch of words disappeared. So we get bugs asking for
words to be added, words that were previously included and are also
included in the "large" dictionary that is available.
The second point is that we're not managing Mozilla specific additions
well. There are about 12000 (questionable) proper names that Mozilla
adds and about 1000 extra terms which are partly grossly wrong. Here
just a random excerpt:
derail's
derange's
deride's
desalt's
descale's
describe's
deserve's
deskill's
despoil's
detest's
dethrone's
detract's
devalue's
devote's
All these are wrong! You can write: "This remind's me of you" without
that being flagged as a mistake! Most likely there were imported once
upon a time, corrected at the source, but never removed from Mozilla's
version.
All extra content in
https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-added
should be reviewed and classified. Again, details in bug 1235506.
I am proposing to change the way the Mozilla dictionary is maintained,
to keep manual intervention to a minimum and the quality to a maximum.
I'm glad that Ehsan agrees that the quality is important. Sadly, we're
currently not delivering a quality dictionary.
Just one more remark: The "large" dictionary I'm proposing to use is
ISO8859-1 encoded (like the "small" one) and contains many words with
accents, including all the ones mentioned in the original post. So there
is no problem.
Jorg K.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform