Am 08.03.2011 um 18:56 schrieb Jürgen Spitzmüller: > IIRC somebody already brought up this issue some time back.
JMarcs mail from 2011-02-02 with subject "Spell checking and breaking words". Yes, we should do something about this. > Hunspell can check both complex composites constructed with (hard) hyphens > (as > in "fifty-year-old chap") and, more interestingly, "elliptical" or "fractal" > composites who use a hard hyphen in order to refer to a "shared" morpheme in > a > paired word form (as in "two- and threefold" or German "Betriebsklima und > -sicherheit"). > > At least in German, both types of these beasts are pretty frequent, so we > should pass hard hyphens to the speller if hunspell is used, instead of just > trimming them off the word (as we do now). The results, at least here, are > way > better. Currently, for instance, Hunspell would mark "-sicherheit" as > misspelled (since we pass "sicherheit") and suggest "Sicherheit" and, nota > bene, "-sicherheit". > > The attached patch is an attempt in this direction. It passes hard hyphens > (and nonbreakdashes) to the speller if this speller is hunspell. > > However, since the isWordSeparator function is not only used by the > spellchecker, this change affects other things as well. For instance, > currently, if you set the cursor left of the word "socio-linguistics" and hit > Insert > Index entry, only "socio" is copied inside the index inset. With my > patch, "socio-linguistics" as a whole is copied inside. This seems more > correct to me, at least wrt English and German, however, it is of course not > suitable that the index entry generation differs wrt to the speller which is > used. > > So my question is, should we > > * limit the change to the hunspell checker only, which means that we set up > an > extra isWordSeparator function (or an option) for the spellchecker? > > * treat hard hyphens not as word separators in general, i.e. ditch the > canHandleSplitMorphemes() part of the patch? * make it a property of the language? Add a list of characters to include in words? We have to add this anyway to drop the escape chars field from spell checker settings. Give the spell checker backend some control over this list - aspell e. g. can remove the dash from it, hunspell would keep it. > To me, the second option makes more sense, but of course this the more > intrusive change. Stephan