Follow-up Comment #5, bug #59397 (group groff): [comment #3 comment #3:] > I think it might be an open question as to whether letters > from outside the basic Latin alphabet _should_ necessarily be > hyphenated like their basic Latin "base characters".
There's a little fuzz in any automated hyphenation system. When encountering the string "project", groff can't know whether it's the noun, which is broken "proj-ect", or the verb, which would be "pro-ject". An LLM could probably figure it out, but short of integrating one of those into groff, it's just going to make its best guess, and rely on the user to override it if it's wrong. On the other hand, when a diacritic changes the syllabication, such as "expose" vs "exposé", it will pretty much (I hedge, but can't think of any exceptions) always do so by adding a syllable, and thus a potential break point. The patterns, presumably, are set up for the unaccented form, meaning groff will never use the additional break point offered by the accented form. But that's fine: it's better to not break a word in an acceptable spot than to break one in an unacceptable spot. And anyway, those are the rarer cases. More commonly, the break points won't change, such as whether "coöperate", "doppelgänger", or "débâcle" are written with or without the diacritics. But in order for any of this to work, the adorned letters need hyphenation codes, which they don't have by default, hence this ticket. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?59397> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature