Follow-up Comment #2, bug #67372 (group groff):

Hi Ingo,

Thanks for the intriguing report.

At 2025-07-28T11:59:51-0400, Ingo Schwarze wrote:
> With the mnemonics
>
> v = variable name
> d = delimiter character
> i = identifier name
> s = suffix string
>
> consider the two following roff(7) input files:
>
> .ds v did
> \A\*vs
>
> .nr v 121
> \A\nvs
>
> With groff-1.22.4, both print "1s" when run through nroff(1)
> because d and 1 are valid delimiter characters and i and 2 are valid
> identifiers,
> so \Adids and \A121s both result in 1s.
>
> With groff-1.23.0, both only print "1" without the trailing "s" and
> also print this incorrect message:
> troff:tmp.roff:2: warning: missing closing delimiter in identifier
> validation escape sequence (got a newline)

Can reproduce.

_groff_ Git HEAD is even more explicit with its diagnostic:


$ cat ATTIC/67372a.groff
.ds v did
\A\*vs
$ ./build/test-groff -aww ATTIC/67372a.groff
troff:ATTIC/67372a.groff:2: warning: missing closing delimiter in identifier
validation escape sequence; expected character 'd', got a newline
<beginning of page>
1


> I say "incorrect" because the closing delimiter is clearly present and
> not "missing".

I confess to some surprise that this exhibit ever worked as intended;
given my understanding of _groff_'s Texinfo manual, the string
interpolation should be assigned a different "input level" than
the identifier validation escape sequence.


5.6.3 Calling Macros
--------------------
...
   Escape sequence interpolation is of higher precedence than escape
sequence argument interpretation.  This rule affords flexibility in
using escape sequences to construct parameters to other escape
sequences.

     .ds family C\" Courier
     .ds style I\" oblique
     Choose a typeface \f(\*[family]\*[style]wisely.
         => Choose a typeface wisely.

In the above, the syntax form '\f(' accepts only two characters for an
argument; the example works because the subsequent escape sequences are
interpolated before the selection escape sequence argument is processed,
and strings 'family' and 'style' interpolate one character each.(2)
(*note Using Escape Sequences-Footnote-2::)



5.38.2 Compatibility Mode
-------------------------

Some syntactical and behavioral differences between GNU and AT&T
'troff's are thought too important to neglect; GNU 'troff' therefore
makes available a "compatibility mode" in an effort to keep documents
prepared for AT&T 'troff' rendering well.

...

   In compatibility mode, GNU 'troff' accepts several characters as
delimiters that it ordinarily rejects because they can begin numeric
expressions and therefore may be ambiguous to the document maintainer.
This set of additional delimiters comprises '0123456789+-(.|'.

   Normally, GNU 'troff' keeps track of delimited arguments'
interpolation depth.  In compatibility mode, it does not.

     .ds xx '
     \w'abc\*(xxdef'
         => 168 (normal mode on a terminal device)
         => 72def' (compatibility mode on a terminal device)


Some of this language is new to the post-1.23.0 trunk.

NEWS:

*  In compatibility mode, GNU troff now accepts delimiters that it
   rejects when not in compatibility mode--namely, ordinary characters
   that can validly begin numeric expressions (which are often
   delimited).  This change improves compatibility with AT&T troff.


...and apparently with _groff_ 1.22.4 and earlier.

And indeed:


$ ./build/test-groff -Caww ATTIC/67372a.groff
<beginning of page>
1s


I'm having trouble finding any evidence that the behavior you're seeing
was ever documented.

Returning to this point:

   Normally, GNU 'troff' keeps track of delimited arguments'
interpolation depth.  In compatibility mode, it does not.


That's not a new claim.  Here's what the _groff_ 1.22.4 manual said:

5.34 Implementation Differences
===============================
...
   Two other features are controlled by '-C'.  If not in compatibility
mode, GNU 'troff' preserves the input level in delimited arguments:

     .ds xx '
     \w'abc\*(xxdef'

In compatibility mode, the string '72def'' is returned; without '-C' the
resulting string is '168' (assuming a TTY output device).


We didn't have a regression test for this behavior, so this may be
a behavior change I introduced unwittingly.  I'll have to bisect to see.

The point that intrigues me here is that while both the old and the new
language in the manual specify the (non-)identity of the delimited
_argument_ with respect to that of the escape sequence itself as it were
(meaning the escape character and the following character, which I term
the "function selector"), neither makes a claim about the input depth of
the delimiters themselves!

I think therefore that we are in unspecified territory here.

> The program complaining about the newline indicates that the parser
> got broken;
> the parser should never reach the newline while parsing the \A
> sequence.

I wouldn't phrase it that way; the parser will "reach" the newline, but
it might do so in the function `skip_line()`, which means the newline
will have no syntactical importance apart from ending the skipping
process once consumed (if not, I think, escaped with the escape
character).

https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n2471

(An _escaped_ newline will not cause `tok.is_newline()` to return true,
because an escaped newline is a different token type.  The reader might
be interested in bug #66987.  If we get new tests out of this ticket, I
wonder if my claim there about all tests passing will hold up.)

But, yes, there is clearly a behavior change here.  I'm suspending
judgment as to whether it should be reverted, or better documented (and
the "NEWS" section for _groff_ 1.23.0 updated).

I'm curious to see the real-world context in which this arose.

> I'm asking for help debugging because i'm very unfamiliar with the
> bowels of the roff parser...

I think I know which piece of bowel this is.

https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n1509



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?67372>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to