Recently I noticed that (thanks to Lior Kaplan, it seems) it is now trivial to get Hebrew spellchecking (based on Hspell 1.1) in OpenOffice. The Hebrew localized version (now available on the official OpenOffice site!) comes with Hebrew spell-checking pre-bundled, and there's an extension [1] for those who use the English version of open-office.
However, when I actually used this spell checker, and observed my wife using it, I noticed two annoying problems in the way it works. I'm not sure if these are OpenOffice problems per se, or perhaps problems that should be solved in the context of hunspell, OpenOffice's spell-checking library. It is possible that changes to the dictionary file is all that is needed to solve these problems, but it is also possible that OpenOffice code needs to be changed. I simply don't know I was hoping that someone here could help me figure this out, or at least point me to the right place to report these problems. The first issue is acronyms (rashei tevot) and abbreviations. In Hebrew, these use the geresh and gershaim (or single or double quotes), which is part of the word. OpenOffice does not understand that these quotes are part of the Hebrew word, and splits the word on them. As a result all acronyms are marked as spelling mistakes. This is really annoying, especially for certain types of documents where acronyms are common. The second issue is the correction suggestions for spelling errors. All the suggestions indeed appear to be valid words, but their order is terrible - it appears little or no attention was paid to trying to provide the most likely suggestions first. The screenshot on the extension page [1] provides an excellent example: When given the mis-spelling עיברי, rather than provide the most likely suggestion first - עברי, it is given as the 8th suggestion, and the first suggestions are highly unlikely. The sixth suggestion is especially unlikely (requiring one accidental transpose and one movement): ערביי. I'd like OpenOffice to use common-sense edit-distance based heuristics to decide which suggestion to give first (i.e., one typing mistake is more likely than two), but also Hebrew-specific rules regarding the "cost" of these edits, e.g., that in Hebrew omitting or adding a vowel (em kri'a) is more likely than omitting or adding just any random letter. Hebrew also has letters that sound the same (e.g., tav and tet) or close, and a bunch of other rules I'd like to see. I believe that hunspell's dictionary in fact has a way to give such correction rules, but I don't know how to correctly write them, or how to make OpenOffice use them. I (and thousands of other OpenOffice users in Israel) would be grateful if someone could look into these issues. Nadav. [1] http://extensions.services.openoffice.org/en/project/dict-he -- Nadav Har'El | Tuesday, Nov 2 2010, 25 Heshvan 5771 n...@math.technion.ac.il |----------------------------------------- Phone +972-523-790466, ICQ 13349191 |The person who knows how to laugh at http://nadav.harel.org.il |himself will never cease to be amused. _______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il