[bug #66040] [troff] no longer warns about unrecognized .hcode input

G. Branden Robinson Tue, 30 Jul 2024 17:05:40 -0700

Follow-up Comment #9, bug #66040 (group groff):

[comment #8 comment #8:]
> [comment #6 comment #6:]
> > [comment #5 comment #5:]
> > > The special character \['e] in the second line is still
> > > rejected even after being assigned a code in the first.
> > 
> > I noticed that too.  I think it's a bug.  I'm working on it.
> 
> I mean, sure, that could be called a bug... but why would it be a bug that
the special character is unrecognized only on its second appearance?


If you mean "as the second member of a pair with itself", I have an answer.

Because the formatter doesn't know what value to give it.  Under the hood,
it's just a character code--in other words, on an ISO 8859 system, the
hyphenation codes for 'a' through 'z' are 97 through 122--but our
documentation stands on its head to avoid saying that.  The trouble is that
there is a potentially larger space of _sui generis_ special characters, by
which I mean ones that don't belong to an equivalence class of a Basic Latin
letter.  Accented vowels are members of those equivalence classes in GNU
_troff_ but the German Eszett is not.  If we had an Icelandic locale, thorn
and eth would similarly have to have hyphenation codes above 127 decimal.

The real fun comes when you add letters from multiple ISO 8859 character sets.
 Before long you're going to have collisions.

So it's good that our documentation does the headstand.  We should not
disclose what the hyphenation code values _are_, we need only to ensure that
they sort into the correct equivalence classes, so that they then interoperate
as desired with the hyphenation patterns.

When we get support for UTF-8-encoded hyphenation pattern files, things will
become straightforward again.

In the meantime, what I think I will do is use a `static int` to mint a
sequence number (starting at 256) for hyphenation codes any time a special
character needs one _sui generis_.
 
> Why shouldn't it just be recognized in any .hcode invocation?

See above.

> And as it happens, that suggestion (bug #42870) was filed ten years ago--and
its fixing would make .hcode's behavior simple to use and simple to document,
whereas allowing it only in some cases complicates both of those.

The solution I proposed above would indeed fix bug #42870, I think.  Do you
agree?

If so, we probably ought to copy the meat of it over to that ticket and mark
it "in progress".
 
> > The part I'm griefed about is "(not a special character escape
sequence)".
> 
> It's griefworthy but it's not inaccurate.

It is.  A special character is _sometimes_ okay, and has been for a long time,
as I tried to illustrate in comment #6.

But it would be laborious and perhaps embarrassing to explain the cases where
it isn't.

> Overturning it is #42870's life goal.

Yes--if my plan survives contact with the enemy, the parenthetical can die.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66040>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #66040] [troff] no longer warns about unrecognized .hcode input

Reply via email to