Follow-up Comment #2, bug #67372 (group groff): Hi Ingo,
Thanks for the intriguing report. At 2025-07-28T11:59:51-0400, Ingo Schwarze wrote: > With the mnemonics > > v = variable name > d = delimiter character > i = identifier name > s = suffix string > > consider the two following roff(7) input files: > > .ds v did > \A\*vs > > .nr v 121 > \A\nvs > > With groff-1.22.4, both print "1s" when run through nroff(1) > because d and 1 are valid delimiter characters and i and 2 are valid > identifiers, > so \Adids and \A121s both result in 1s. > > With groff-1.23.0, both only print "1" without the trailing "s" and > also print this incorrect message: > troff:tmp.roff:2: warning: missing closing delimiter in identifier > validation escape sequence (got a newline) Can reproduce. _groff_ Git HEAD is even more explicit with its diagnostic: $ cat ATTIC/67372a.groff .ds v did \A\*vs $ ./build/test-groff -aww ATTIC/67372a.groff troff:ATTIC/67372a.groff:2: warning: missing closing delimiter in identifier validation escape sequence; expected character 'd', got a newline <beginning of page> 1 > I say "incorrect" because the closing delimiter is clearly present and > not "missing". I confess to some surprise that this exhibit ever worked as intended; given my understanding of _groff_'s Texinfo manual, the string interpolation should be assigned a different "input level" than the identifier validation escape sequence. 5.6.3 Calling Macros -------------------- ... Escape sequence interpolation is of higher precedence than escape sequence argument interpretation. This rule affords flexibility in using escape sequences to construct parameters to other escape sequences. .ds family C\" Courier .ds style I\" oblique Choose a typeface \f(\*[family]\*[style]wisely. => Choose a typeface wisely. In the above, the syntax form '\f(' accepts only two characters for an argument; the example works because the subsequent escape sequences are interpolated before the selection escape sequence argument is processed, and strings 'family' and 'style' interpolate one character each.(2) (*note Using Escape Sequences-Footnote-2::) 5.38.2 Compatibility Mode ------------------------- Some syntactical and behavioral differences between GNU and AT&T 'troff's are thought too important to neglect; GNU 'troff' therefore makes available a "compatibility mode" in an effort to keep documents prepared for AT&T 'troff' rendering well. ... In compatibility mode, GNU 'troff' accepts several characters as delimiters that it ordinarily rejects because they can begin numeric expressions and therefore may be ambiguous to the document maintainer. This set of additional delimiters comprises '0123456789+-(.|'. Normally, GNU 'troff' keeps track of delimited arguments' interpolation depth. In compatibility mode, it does not. .ds xx ' \w'abc\*(xxdef' => 168 (normal mode on a terminal device) => 72def' (compatibility mode on a terminal device) Some of this language is new to the post-1.23.0 trunk. NEWS: * In compatibility mode, GNU troff now accepts delimiters that it rejects when not in compatibility mode--namely, ordinary characters that can validly begin numeric expressions (which are often delimited). This change improves compatibility with AT&T troff. ...and apparently with _groff_ 1.22.4 and earlier. And indeed: $ ./build/test-groff -Caww ATTIC/67372a.groff <beginning of page> 1s I'm having trouble finding any evidence that the behavior you're seeing was ever documented. Returning to this point: Normally, GNU 'troff' keeps track of delimited arguments' interpolation depth. In compatibility mode, it does not. That's not a new claim. Here's what the _groff_ 1.22.4 manual said: 5.34 Implementation Differences =============================== ... Two other features are controlled by '-C'. If not in compatibility mode, GNU 'troff' preserves the input level in delimited arguments: .ds xx ' \w'abc\*(xxdef' In compatibility mode, the string '72def'' is returned; without '-C' the resulting string is '168' (assuming a TTY output device). We didn't have a regression test for this behavior, so this may be a behavior change I introduced unwittingly. I'll have to bisect to see. The point that intrigues me here is that while both the old and the new language in the manual specify the (non-)identity of the delimited _argument_ with respect to that of the escape sequence itself as it were (meaning the escape character and the following character, which I term the "function selector"), neither makes a claim about the input depth of the delimiters themselves! I think therefore that we are in unspecified territory here. > The program complaining about the newline indicates that the parser > got broken; > the parser should never reach the newline while parsing the \A > sequence. I wouldn't phrase it that way; the parser will "reach" the newline, but it might do so in the function `skip_line()`, which means the newline will have no syntactical importance apart from ending the skipping process once consumed (if not, I think, escaped with the escape character). https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n2471 (An _escaped_ newline will not cause `tok.is_newline()` to return true, because an escaped newline is a different token type. The reader might be interested in bug #66987. If we get new tests out of this ticket, I wonder if my claim there about all tests passing will hold up.) But, yes, there is clearly a behavior change here. I'm suspending judgment as to whether it should be reverted, or better documented (and the "NEWS" section for _groff_ 1.23.0 updated). I'm curious to see the real-world context in which this arose. > I'm asking for help debugging because i'm very unfamiliar with the > bowels of the roff parser... I think I know which piece of bowel this is. https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.cpp?h=1.23.0#n1509 _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?67372> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature