Follow-up Comment #1, bug #66686 (group groff):

> -------------------------------------------------------
> Date: Mon 20 Jan 2025 12:09:37 PM CST By: Dave <barx>
> Modifying an example from the info manual to change the delimiters for
> \w's argument to a + sign:
>
> $ echo "The length of the string 'abc' is \w+abc+u." | groff -Tascii \
> | cat -s
> The length of the string 'abc' is 72u.

I feel that this report is a close sibling of Paul Eggert's regarding
the `|` character.  I walked that one back because `|` is known in the
wild as a delimiter.

> This has worked for decades in groff, and works in Heirloom troff.

Yes, probably.

> It has no syntactical ambiguity,

This point is arguable, but it has _human_ ambiguity...

> because \w does not take a numeric expression.

Agreed, it does not.  But it might _appear in one_.

(I see multiple examples in each of "s.tmac", "m.tmac", and "om.tmac".)

> In a bleeding-edge groff, it no longer works.
>
> $ echo "The length of the string 'abc' is \w+abc+u." \
> | groff-latest -Tascii |
> cat -s
> troff:<standard input>:1: error: character '+' is not allowed as a
> delimiter
> The length of the string 'abc' is abc+u.
>
> And the manner in which it fails--transforming something that used to
> be a numeric expression into something that (probably) no longer
> is--sends things further off the rails when calculations are attempted
> with this value.

Exactly!

.nr desired-len 2n+\w+abc+u
.nr another-desired-len \w+abc+u+2n
.nr yet-another-desired-len \w+abc++2n
.nr double-desired-len \n[desired-len]+\w+abc++2n

At some point this gets too crazy.  When Clark implemented GNU troff, he
clearly _intended_ to take some delimiters off the table.

> This change dates from the last 5 months.  In that time frame:
>
> * Bug #66481 fixed a similar back-compatibility-breaking problem with
> the \w escape and a | delimiter.  But this couldn't have introduced
> the current problem as a side effect: all it does is revert the fix
> for bug #66009 fix, and the + delimiter worked fine in the pre-66009
> days.

I'd link to

https://git.savannah.gnu.org/cgit/groff.git/tree/?h=1.23.0

but Savannah's been undergoing a DDoS for the past several days and the
service is unavailable.

> * Bug #63142 disallowed newlines as delimiters, which was an
> un(der)used GNU aberration from the start.  But its commit message
> says "As a bonus, check starting delimters [sic] for these escape
> sequences (`\[obAZwX]`) for validity in general."  So I suspect this
> bug may have been introduced as part of that bonus.

I haven't bisected it down, but here's the commit that produced the
forbidden delimiters list in its current form.

commit 61141803ffbb9cc7ed8c27744bc8689c43a953d2
Author: G. Branden Robinson <g.branden.robin...@gmail.com>
Date:   Thu Mar 7 07:08:38 2024 -0600

    [troff]: Refactor (is_char_usable_as_delimiter).

    * src/roff/troff/input.cpp: Refactor.  Pull delimiter character
      validator into its own function operating on a character, rather than
      on an object of the token class.

      (is_char_usable_as_delimiter): New function compares `char` parameter
      to list of valid delimiters.

      (token::is_usable_as_delimiter): Refactor to call the foregoing.

My intent with that commit was not to forbid more delimiters (hence my
use of the word "refactor"), and I'm not sure that it did.  I'll need to
bisect.

However I may be tempted to just "NEWS" this because I don't think many
people use expression operators as delimiters.  I was wrong about `|` in
the wild, and indeed it's a pretty obscure symbol as numerical
expression operator.  Enough that it doesn't even have a proper name.
People have called it the "absolute motion operator", which is
misleading.  I term it the "boundary-relative motion operator", which is
a mouthful but is, at least as far as I can tell, accurate.  In many
contexts, like register assignments, it's an identity operator and does
no useful work.

The plus sign is more broadly understood to be used for math.  Though in
its unary form, it too is an identity operator.

Notice that Git HEAD's behavior also forbids this perversity:


$ ~/groff-HEAD/bin/groff -ww
.nr a \w9stuff9
troff: backtrace: file '<standard input>':1
troff:<standard input>:1: error: character '9' is not allowed as a delimiter
troff: backtrace: file '<standard input>':1
troff:<standard input>:1: warning: expected numeric expression, got character
's'


...whereas groff 1.23.0 did not.


$ ~/groff-stable/bin/groff -ww
.nr a \w9stuff9
.tm \na
18080




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66686>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to