URL: <https://savannah.gnu.org/bugs/?66051>
Summary: [troff] permit special characters to have bespoke hyphenation codes Group: GNU roff Submitter: gbranden Submitted: Wed 31 Jul 2024 09:08:38 PM UTC Category: Core Severity: 1 - Wish Item Group: Feature change Status: None Privacy: Public Assigned to: None Open/Closed: Open Discussion Lock: Any Planned Release: None _______________________________________________________ Follow-up Comments: ------------------------------------------------------- Date: Wed 31 Jul 2024 09:08:38 PM UTC By: G. Branden Robinson <gbranden> This idea is a descendant of bug #42870, which asked for something a little more modest. The concept is this. If we can do this... .hcode ß ß ...why can't we do this? .hcode \[ss] \[ss] This has long produced a diagnostic. $ printf '.hcode \\[ss] \\[ss]\n' | ~/groff-stable/bin/groff troff:<standard input>:1: error: hyphenation code must be ordinary character I suggested an answer in bug #66040, comment 9. > Because the formatter doesn't know what [hyphenation code] value to give [the special character]. Under the hood, [a hyphenation code] is just a character code--in other words, on an ISO 8859 system, the hyphenation codes for 'a' through 'z' are 97 through 122--but our documentation stands on its head to avoid saying that. The trouble is that there is a potentially larger space of _sui generis_ special characters, by which I mean ones that don't belong to an equivalence class of a Basic Latin letter. [... The] German Eszett [for example] is not. If we had an Icelandic locale, thorn and eth would similarly have to have hyphenation codes above 127 decimal. > > The real fun comes when you add letters from multiple ISO 8859 character sets. Before long you're going to have collisions. > > So it's good that our documentation does the headstand. We should not disclose what the hyphenation code values _are_, we need only to ensure that they sort into the correct equivalence classes, so that they then interoperate as desired with the hyphenation patterns. > > When we get support for UTF-8-encoded hyphenation pattern files, things will become straightforward again. > > In the meantime, what I think I will do is use a `static int` to mint a sequence number (starting at 256) for hyphenation codes any time a special character needs one _sui generis_. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66051> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature