[bug #66675] [troff] valid .char definition starting with `\[u` provokes erroneous error

G. Branden Robinson Sun, 02 Feb 2025 17:47:37 -0800

Follow-up Comment #9, bug #66675 (group groff):

At 2025-02-01T17:05:10-0500, Dave wrote:
> Follow-up Comment #8, bug #66675 (group groff):
>
> I sure don't want to undo progress toward a major goal.  But I feel
> like a lot of what is described here is missing the bigger picture.


I'm probably musing excessively, recording notes for my own edification
(or for whoever succeeds me in a speeding bus scenario).

> Most places in groff that a \[] character comes up, whatever's inside
> the brackets has no _meaning_ to groff.  It's just an identifier,

Agreed.  It follows the same rules as other GNU troff identifiers.

> that gets matched to a symbol name somewhere:

Yes, but.  Unfortunately the fast and loose approach to nomenclature
practiced early in groff development plants a land mine here.  The
formatter has a class called "symbol" and it has nothing in particular
to do with characters or glyphs.  A "symbol" is an object with an
identifier that you might, say, store in a dictionary.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/include/symbol.h?h=1.23.0

I raise this issue simply so that people are aware of one of the (many)
factors that frustrate coherent and consistent discussion of groff.

> ultimately, either from a font file, or defined by a .char-family
> request.

Yes.  Right now, in my head, I'm using the terms "glyph" and
"user-defined character" for these, respectively.

> (The complex rules in the manual under "GNU troff searches for a
> symbol as follows" all boil down to one of these two sources.)

Agreed.

> That some of these identifiers happen to follow a naming convention
> that _also_ indicates a Unicode character is, as far as groff itself
> was concerned, irrelevant.  That's for the convenience of humans who
> want to know what that symbol represents.  So groff trying to divine
> meaning from this identifier is usually the wrong thing to do.

Usually, yes.  The "progress toward a major goal" created an exception;
the machinery that writes the content of a `device` request or an `\X`
escape sequence needs to:

1.  Pass through valid Unicode special character escape sequences,
    including composite ones (see groff_char(7)), as-is;

2.  Attempt to translate any other special character to form #1 above;

3.  Warn if #2 fails.

> What if you have a string that is also serving double duty in, say, a
> PDF table of contents entry, where groff itself isn't rendering this
> string, but needs to encode it in a way that some other piece of
> software _can_ render it?  In that case, identifiers that _are_ also
> unicode characters matter.

Yes.  Instead of "_are_", I'd say "admitting an obvious translation to
one or Unicode code points", but yes.  Hence the procedure above.

> But such a string may contain the both symbol \[u2026] and the symbol
> \[gobbledygook].  Groff can figure out that one of those represents a
> Unicode character, and encode it properly in the TOC.  What does it do
> with \[gobbledygook]?  I have no idea, but that's a question it has to
> answer for _any_ symbol it can't directly put into a TOC--whether that
> symbol is \[gobbledygook] or \[u202Z].  Neither is a Unicode
> character, but the fact that one isn't but looks like it _could_ be,
> whereas the other isn't and looks nothing like one, ought to be
> immaterial.  Handling them differently seems fundamentally wrong.

I agree.  It's a lexing/parsing problem.

> OK, now you can tell me where I'm misunderstanding some essential
> aspect of the problem that invalidates everything I wrote.

Not at all.  I stand by comment #4; you identified a flaw in code I
wrote, and I aim to fix it.

(The rest of this is pretty much just a further status report.)

When I picked up my tools to do so, I got a surprise (comment #5).

I've made more headway (comment #7) and have a rewrite of
`define_character()` that lexes its arguments beautifully and
then...fails to actually create a new character definition.  Huh?

Along the way I detected what looks like pointless code in the existing
function definition that made its logic harder to understand.  To figure
out what was going on led me down the path of implementing a `pchar`
request, which doesn't tell me what I initially had in mind in response
to other frustrations (resolution order), but does other useful things.


.pchar a
character 'a'
  is not translated
  does not have a macro
  special translation: 0
  hyphenation code: 97
  flags: 0
  ASCII code: 97
  asciify code: 0
  is found
  is transparently translatable
  is not translatable as input
  mode: normal
.pchar \[ua]
special character "ua"
  is not translated
  does not have a macro
  special translation: 0
  hyphenation code: 0
  flags: 0
  ASCII code: 0
  asciify code: 0
  is found
  is transparently translatable
  is not translatable as input
  mode: normal
.char \[foo] bar
.pchar \[foo]
special character "foo"
  is not translated
  has a macro
  special translation: 0
  hyphenation code: 0
  flags: 0
  ASCII code: 0
  asciify code: 0
  is found
  is transparently translatable
  is not translatable as input
  mode: normal


I don't plan to document this for groff 1.24.  I really hate the use of
the verb "translation" to mean 3 different things; "is found" seems
always to be true; "flags" should get an English description; the
contents of the macro should be reported (that's a heavy lift of itself,
but with a potentially huge payoff).  But, even in this primitive form
it's helpful for troubleshooting, and it suggests that I should
de-document and remove the new `phcode` request; this new `pchar`
supersedes it.

Whenever I get around to solving the resolution-reporting problem, maybe
a good name for that request would be `pfindchar`.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66675>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #66675] [troff] valid .char definition starting with `\[u` provokes erroneous error

Reply via email to