Follow-up Comment #5, bug #66919 (group groff): Hi Dave,
At 2025-03-18T01:25:14-0400, Dave wrote: > Follow-up Comment #4: > > [comment #3 comment #3:] >> We seem to be talking totally past each other. > > It's our superpower! Fortunately, some kryptonite appears to be arriving on the scene. > Happily, after only 5 minutes of frustration I've managed to craft an > input file that shows the behavior change from older groffs (tested on > 1.19.2, 1.22.4, and 1.23.0) to current groff. (Or current as of when > this bug was opened a couple days ago. I haven't built your latest > push.) I don't think it's changed. I've thrashed the hell out of `pline` behavior but not touched hyphenation logic in a little while. > The following has to be encoded via Latin-1 to accommodate .hcode's > limitation on accepting only an 8-bit character as its second > argument. Right. > .ll 1n > lanteronial > lanter\[~o]nial > .hcode \[~o] õ > lanter\[~o]nial > > I recommend running the file with groff's "-a -Wbreak" options, though > any other output format should also suffice. Yes. That pair of options should be near the top of anyone's bag of tricks when debugging hyphenation problems. > Here's the -a output I see in all listed older groffs: > > <beginning of page> > lantero<hy> > nial > lanter<~o>nial > lanter<~o><hy> > nial > > And here's current groff: > > <beginning of page> > lantero<hy> > nial > lanter<~o>nial > lanter<~o>nial Confirmed. I can reproduce this. > The .hcode that formerly worked can be seen here to not work, based on > the different breaking behavior. And, notably, changing the first > hcode parameter from the special-character version to the Latin-1 > version of the character _does_ make this match older groff's > behavior. That makes sense to me; since it's an "ordinary character", it forces the character õ to adopt its own code point value as its hyphenation code. We can do the same thing with the equals sign, and summon a bespoke hyphenation code from the vasty deep. $ printf '.pchar =\n.hcode = =\n.pchar =\n' | ./build/test-groff 2>&1 | grep hyph hyphenation code: 0 hyphenation code: 61 And that's how I fix the bug you're reporting. $ iconv -f iso-8859-1 ATTIC/66919c.groff .ll 1n lanteronial .pchar \[~o] lanter\[~o]nial .\" Now let's pretend we speak Portuguese. .hcode õ o .hcode \[~o] õ .pchar \[~o] lanter\[~o]nial $ ./build/test-groff -a -Wbreak ATTIC/66919c.groff <beginning of page> lantero<hy> nial special character "~o" is not translated does not have a macro special translation: 0 hyphenation code: 0 flags: 0 ASCII code: 0 asciify code: 245 is found is transparently translatable is translatable as input mode: normal lanter<~o>nial special character "~o" is not translated does not have a macro special translation: 0 hyphenation code: 111 flags: 0 ASCII code: 0 asciify code: 245 is found is transparently translatable is translatable as input mode: normal lanter<~o><hy> nial >> The assertion in the bug summary is that ".hcode no longer accepts a >> special character as its first argument". > > You're right, I failed to qualify the summary with a "sometimes" in > light of your modified example. Done now. That helps! >> How are we to distinguish special characters that are created with a >> default hyphenation code of zero from ones have that a hyphenation >> code of zero assigned to them by copying from an ordinary character >> that has a hyphenation code of zero? > > But "reflexive hcode," if you will (that is, ".hcode x x," for any > value of x) is a special case that doesn't assign the hyphenation code > of x to x, but assigns a new, unique hyphenation code to x. Agreed, as illustrated above. Where "new, unique" just means its code point value in the character encoding, but the groff documentation is coy about this fact for reasons that I think are mostly sound but carry a cost in situations like this. > (In some past bug report I ruminated on the unwisdom of using the same > syntax for "reflexive hcode" as for other hcode assignments, but this > is the syntax we've inherited.) It makes me uneasy as well. I continue to think that what you're diagnosing here is not any sort of new problem with the `hcode` request, but the fact that I went with option 1 in bug #66112. In English, õ used to get a (nonzero) hyphenation code in groff (because "latin1.tmac" gave it one), and now it no longer does. If we had a Portuguese localization file, I would strongly expect to it assign a hyphenation code to õ. Have I persuaded you? If so, I propose that this ticket is re-item-groupable to "Documentation", demanding of a "NEWS" item. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66919> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature