Follow-up Comment #9, bug #66040 (group groff): [comment #8 comment #8:] > [comment #6 comment #6:] > > [comment #5 comment #5:] > > > The special character \['e] in the second line is still > > > rejected even after being assigned a code in the first. > > > > I noticed that too. I think it's a bug. I'm working on it. > > I mean, sure, that could be called a bug... but why would it be a bug that the special character is unrecognized only on its second appearance?
If you mean "as the second member of a pair with itself", I have an answer. Because the formatter doesn't know what value to give it. Under the hood, it's just a character code--in other words, on an ISO 8859 system, the hyphenation codes for 'a' through 'z' are 97 through 122--but our documentation stands on its head to avoid saying that. The trouble is that there is a potentially larger space of _sui generis_ special characters, by which I mean ones that don't belong to an equivalence class of a Basic Latin letter. Accented vowels are members of those equivalence classes in GNU _troff_ but the German Eszett is not. If we had an Icelandic locale, thorn and eth would similarly have to have hyphenation codes above 127 decimal. The real fun comes when you add letters from multiple ISO 8859 character sets. Before long you're going to have collisions. So it's good that our documentation does the headstand. We should not disclose what the hyphenation code values _are_, we need only to ensure that they sort into the correct equivalence classes, so that they then interoperate as desired with the hyphenation patterns. When we get support for UTF-8-encoded hyphenation pattern files, things will become straightforward again. In the meantime, what I think I will do is use a `static int` to mint a sequence number (starting at 256) for hyphenation codes any time a special character needs one _sui generis_. > Why shouldn't it just be recognized in any .hcode invocation? See above. > And as it happens, that suggestion (bug #42870) was filed ten years ago--and its fixing would make .hcode's behavior simple to use and simple to document, whereas allowing it only in some cases complicates both of those. The solution I proposed above would indeed fix bug #42870, I think. Do you agree? If so, we probably ought to copy the meat of it over to that ticket and mark it "in progress". > > The part I'm griefed about is "(not a special character escape sequence)". > > It's griefworthy but it's not inaccurate. It is. A special character is _sometimes_ okay, and has been for a long time, as I tried to illustrate in comment #6. But it would be laborious and perhaps embarrassing to explain the cases where it isn't. > Overturning it is #42870's life goal. Yes--if my plan survives contact with the enemy, the parenthetical can die. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66040> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature