[bug #66919] [troff] .hcode sometimes fails to accept a special character as a first argument

G. Branden Robinson Mon, 17 Mar 2025 23:01:07 -0700

Follow-up Comment #5, bug #66919 (group groff):

Hi Dave,

At 2025-03-18T01:25:14-0400, Dave wrote:
> Follow-up Comment #4:
>
> [comment #3 comment #3:]
>> We seem to be talking totally past each other.
>
> It's our superpower!

Fortunately, some kryptonite appears to be arriving on the scene.

> Happily, after only 5 minutes of frustration I've managed to craft an
> input file that shows the behavior change from older groffs (tested on
> 1.19.2, 1.22.4, and 1.23.0) to current groff.  (Or current as of when
> this bug was opened a couple days ago.  I haven't built your latest
> push.)

I don't think it's changed.  I've thrashed the hell out of `pline`
behavior but not touched hyphenation logic in a little while.

> The following has to be encoded via Latin-1 to accommodate .hcode's
> limitation on accepting only an 8-bit character as its second
> argument.

Right.

> .ll 1n
> lanteronial
> lanter\[~o]nial
> .hcode \[~o] õ
> lanter\[~o]nial
>
> I recommend running the file with groff's "-a -Wbreak" options, though
> any other output format should also suffice.

Yes.  That pair of options should be near the top of anyone's bag of
tricks when debugging hyphenation problems.

> Here's the -a output I see in all listed older groffs:
>
> <beginning of page>
> lantero<hy>
> nial
> lanter<~o>nial
> lanter<~o><hy>
> nial
>
> And here's current groff:
>
> <beginning of page>
> lantero<hy>
> nial
> lanter<~o>nial
> lanter<~o>nial

Confirmed.  I can reproduce this.

> The .hcode that formerly worked can be seen here to not work, based on
> the different breaking behavior.  And, notably, changing the first
> hcode parameter from the special-character version to the Latin-1
> version of the character _does_ make this match older groff's
> behavior.

That makes sense to me; since it's an "ordinary character", it forces
the character õ to adopt its own code point value as its hyphenation
code.

We can do the same thing with the equals sign, and summon a bespoke
hyphenation code from the vasty deep.

$ printf '.pchar =\n.hcode = =\n.pchar =\n' | ./build/test-groff 2>&1 | grep
hyph
  hyphenation code: 0
  hyphenation code: 61

And that's how I fix the bug you're reporting.

$ iconv -f iso-8859-1 ATTIC/66919c.groff
.ll 1n
lanteronial
.pchar \[~o]
lanter\[~o]nial
.\" Now let's pretend we speak Portuguese.
.hcode õ o
.hcode \[~o] õ
.pchar \[~o]
lanter\[~o]nial
$ ./build/test-groff -a -Wbreak ATTIC/66919c.groff
<beginning of page>
lantero<hy>
nial
special character "~o"
  is not translated
  does not have a macro
  special translation: 0
  hyphenation code: 0
  flags: 0
  ASCII code: 0
  asciify code: 245
  is found
  is transparently translatable
  is translatable as input
  mode: normal
lanter<~o>nial
special character "~o"
  is not translated
  does not have a macro
  special translation: 0
  hyphenation code: 111
  flags: 0
  ASCII code: 0
  asciify code: 245
  is found
  is transparently translatable
  is translatable as input
  mode: normal
lanter<~o><hy>
nial

>> The assertion in the bug summary is that ".hcode no longer accepts a
>> special character as its first argument".
>
> You're right, I failed to qualify the summary with a "sometimes" in
> light of your modified example.  Done now.

That helps!

>> How are we to distinguish special characters that are created with a
>> default hyphenation code of zero from ones have that a hyphenation
>> code of zero assigned to them by copying from an ordinary character
>> that has a hyphenation code of zero?
>
> But "reflexive hcode," if you will (that is, ".hcode x x," for any
> value of x) is a special case that doesn't assign the hyphenation code
> of x to x, but assigns a new, unique hyphenation code to x.

Agreed, as illustrated above.  Where "new, unique" just means its code
point value in the character encoding, but the groff documentation is
coy about this fact for reasons that I think are mostly sound but carry
a cost in situations like this.

> (In some past bug report I ruminated on the unwisdom of using the same
> syntax for "reflexive hcode" as for other hcode assignments, but this
> is the syntax we've inherited.)

It makes me uneasy as well.

I continue to think that what you're diagnosing here is not any sort of
new problem with the `hcode` request, but the fact that I went with
option 1 in bug #66112.  In English, õ used to get a (nonzero)
hyphenation code in groff (because "latin1.tmac" gave it one), and now
it no longer does.

If we had a Portuguese localization file, I would strongly expect to it
assign a hyphenation code to õ.

Have I persuaded you?  If so, I propose that this ticket is
re-item-groupable to "Documentation", demanding of a "NEWS" item.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66919>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #66919] [troff] .hcode sometimes fails to accept a special character as a first argument

Reply via email to