On 2015-12-29 3:34 PM, Jörg Knobloch wrote:
On 29/12/2015 18:23, Ehsan Akhgari wrote:
I see no reason for Mozilla to stop maintaining and shipping the en-US
dictionary.
Agreed. But we should take a different approach. I disagree that the
current process is working well since it carries forward legacy errors.
I must admit that my original post was somewhat unfortunate since I
wasn't fully aware of the Mozilla process. It would be great if Mozilla
could just obtain a suitable dictionary from a third party and ship it.
Sadly that's not the case.
The practise is that Mozilla uses the SCOWL/Aspell word list and adds
Mozilla "special" words to it. Details can be found in bug 1235506.
They are not Mozilla special words. They are words that we want to add
to our spell checking dictionary that don't exist in the upstream SCOWL
word list.
IOW, our en-US dictionary is a super-set of the SCOWL en-US dictionary.
My first point is: We're currently using SCOWL's "small" dictionary from
which recently a bunch of words disappeared. So we get bugs asking for
words to be added, words that were previously included and are also
included in the "large" dictionary that is available.
AFAIK the SCOWL project recommends against using the large word list for
spell checking. If you find evidence to the contrary, I would like to
know more about that.
(About the words that disappeared, please file a bug and attach the list
of the words.)
The second point is that we're not managing Mozilla specific additions
well. There are about 12000 (questionable) proper names that Mozilla
adds and about 1000 extra terms which are partly grossly wrong. Here
just a random excerpt:
derail's
derange's
deride's
desalt's
descale's
describe's
deserve's
deskill's
despoil's
detest's
dethrone's
detract's
devalue's
devote's
All these are wrong! You can write: "This remind's me of you" without
that being flagged as a mistake! Most likely there were imported once
upon a time, corrected at the source, but never removed from Mozilla's
version.
All extra content in
https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-added
should be reviewed and classified. Again, details in bug 1235506.
I welcome someone going through the list and reviewing the additions.
No matter what process we use for maintaining our word list, that will
be needed and appreciated.
I am proposing to change the way the Mozilla dictionary is maintained,
to keep manual intervention to a minimum and the quality to a maximum.
I'm glad that Ehsan agrees that the quality is important. Sadly, we're
currently not delivering a quality dictionary.
I'm really lost on what problem in the process you are talking about.
Looks like you have found some issues in the word list, which is great.
But I don't see any of these having anything to do with the process
for updating the word list. If you're suggesting that we should not
maintain any additions on top of SCOWL, that is effectively asking for a
regression to the quality of our word list, and as such is unacceptable.
Just one more remark: The "large" dictionary I'm proposing to use is
ISO8859-1 encoded (like the "small" one) and contains many words with
accents, including all the ones mentioned in the original post. So there
is no problem.
If you find a way to encode these accented characters properly, we can
add them to the word list that we maintain.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform