[giving this sub-thread a more appropriate title]

At 2025-02-22T23:23:35+0100, onf wrote:
> [re-arranging]
> 
> On Sat Feb 22, 2025 at 9:37 PM CET, G. Branden Robinson wrote:
> > At 2025-02-22T16:38:00+0100, onf wrote:
> > [...]
> > > I feel like changing ab, hpf, hpfa, nx, so, and tm (the others aren't
> > > implemented by neatroff) to allow spaces in the middle of their
> > > arguments might have more chance of success.
> >
> > I begin to get the feeling you're not paying close attention to what
> > I'm saying in emails, what I've quoted from the "NEWS" file,[1] or
> > how GNU troff actually behaves.
> 
> I felt the same way about you, but it seems I might just be
> misunderstanding your words.
[...]
> To me, this pretty clearly says that this:
>   $ echo '.ab DONE' > nx
>   $ nroff << EOF
>   .nx nx \" continue with file nx
>   EOF
> will no longer work.

Yes, I see now--you're focused almost exclusively on cases where people,
when using requets that accept file name arguments, follow those
arguments with spaces and then comments.

I invite you to perform statistical measurement of such occurrences and
compare that measurement to uses of the same requests without a trailing
space and comment.

> Testing with git master though, it does work.
>   $ git describe
>   1.23.0-2819-g0d8867598
>   $ ./nroff << EOF
>   .nx nx \" continue with file nx
>   EOF
>   DONE

I assume you populated the a file named "nx" with ".ab DONE" or
similar...?

$ git describe
1.23.0-2822-gfa4612e5a
$ ./build/test-groff -Tascii <<EOF
.nx nx
EOF
DONE
$ printf '.ab DONE\n' > nx

Now let's try it with a trailing space and comment.

$ ./build/test-groff -Tascii <<EOF
.nx nx \" skip to the file named 'nx'
EOF
troff: error: cannot open 'nx ': No such file or directory

...so this "misbehaves", if you will, exactly as described in the "NEWS"
file.

But we're buying something with this putative disaster of backward
incompatibility.  And that's this.

$ printf '.ab DONE\n' > "my next file"
$ ./build/test-groff -Tascii <<EOF
.nx my next file\" skip to the file named 'my next file'
EOF
DONE

> But that's probably because it doesn't contain these changes yet:
>   $ echo '.ab DONE' > 'foo bar'
>   $ cat 'foo bar'
>   .ab DONE
>   $ ./nroff << EOF
>   .nx foo bar \" skip to next file
>   EOF
>   troff:<standard input>:1: error: can't open 'foo': No such file or directory
> 
> Which is weird because git log claims it does:
>   $ git log --pretty=reference | grep spacey | grep -w nx
>   4812e5548 ([troff]: `nx` now accepts spacey file names., 2024-12-07)
> 
> And just to be sure...
>   $ make
>   make  all-recursive
>   make[1]: Entering directory '/home/ondra/Downloads/git/groff'
>   make[2]: Entering directory '/home/ondra/Downloads/git/groff'
>   make[2]: Leaving directory '/home/ondra/Downloads/git/groff'
>   make[1]: Leaving directory '/home/ondra/Downloads/git/groff'
> 
> (That is, all targets are up to date.)

I can't account for why your build is behaving differently from mine.
(The explanation does not lie in the 3 commits I haven't pushed yet.[1])

> > I encourage you to examine the behavior of AT&T troff with respect
> > to the `ab`, `nx`, `so`, and `tm` requests.  I have pointed you
> > several times to the comments in Savannah #65108,[2] to apparently
> > little avail.
> 
> > > I forgot to emphasize the likely largest obstacle, which is the fact
> > > that it
> >
> > (presumably Savannah #66625, and/or changes already in Git)
> 
> No, Savannah #65108.

Savannah #65108 is "Open", not "Closed".  Not all of the changes it
contemplates have been made yet (and some, I think, will not be).
Fortunately, the bits not yet implemented are not matters you've brought
up.[2]  (But I can foresee where we're headed; I have have visions of
you defending the honor of people who've named their roff input files
things like 'foo\[u007F]bar', where the single quotes indicate a
shell-like literal.)

> > > would break compatibility with documents written for neatroff.
> >
> > In what way?
> 
> Reading the ticket #65108, comment #3 says:
>   1.  An argument of type `file` (as described in groff(7)) to a
>       request consumes the rest of the rest [sic] of the line.

Right.  The parser discards comments when reading in copy mode as it
always has.

>   2.  Unescaped spaces can therefore populate the argument.

Right.

>   3.  A leading double quote is recognized and removed; a file name
>       can thus start with spaces.

Right.

>   4.  Any other/remaining double quotes are not treated specially.

Right.

>   5.  Only the following escape sequences are recognized.
>   5a. `\ ` (backslash-space) represents a space.  It is not necessary
>       in troff, but is recognized to avoid disrupting existing
>       soelim(1) usage.
>   5b. `\"` ends the file name argument and starts a comment.
>   5c. `\\` represents a (single) literal backslash.  It is handled
>       however the system's standard C library wants to handle it.
>   5d. `\[u00XX]` where each X is an uppercase hexadecimal digit
>       encodes a character.  Only codes in the range 00-1F and 80-FF
>       are accepted in this syntax; those in the range 20-7F are
>       ignored with a diagnostic advising the user to deobfuscate their
>       inputs.

None the points in item 5 are implemented yet, and after discussion with
Dave Kemper, recorded in Savannah #65108, I no longer seek to implement
points 5a or 5b.  I'd like to implement 5d for groff 1.25; with it,
point 5c is not strictly necessary, as one could express a literal
backslash in a file name as `\[u005C]`, but that seems to me
unnecessarily unrewarding of well-established Unix (and *roff) habits.

(An open question is whether to treat only the backslash as a backslash,
or to recognize the configured escape character in this role, even
though what we're constructing is not actually a groff escape sequence.
I can see arguments either way.  Much depends, I think, on why the
escape character is configurable in the first place.  If it's to ease
the production of *roff input embedded in another language, like the
shell, that would suggest one answer.)

> The combination of points 1, 5, and 5b seems to imply that
>   .nx nx \" load file nx

Point 1 is sufficient to imply that; neither point 5 in toto nor its
sub-point 5b are relevant here.  (The comment escape sequence, and what
follows it, do not survive lexical analysis to be "seen" by the request
handler.)

> will be interpretted as loading file 'nx ' (without the quotes).

Right.

> That would obviously break compatibility,

...in what I claim to be unusual cases...

> because other troffs interpret it as loading file 'nx':
>   $ 9 nroff << EOF
>   .nx nx \" load file nx
>   EOF
>   DONE

Right.

> > > Ali seems averse to breaking backwards compatibility with both
> > > AT&T troff and past versions of neatroff.
> >
> > That's a reasonable inclination.  I think a more accurate
> > characterization of the changes I have made and am proposing,
> > however, is that things that simply never could work before, now
> > can.
> 
> If that's an accurate characterization, then please explain what that
> NEWS file is about, because I don't get it.

Again your references are too broad and too general.  "NEWS" is a
4,398-line file.  Even the part covering only "VERSION next" is ~760
lines.

There is a behavior change.  It does "break compatibility" in cases I
consider rare, marginal, and easily worked around.  That, you might say,
is what the NEWS file is about.

At the same time, groff users have for about 35 years had to "close up"
gaps after macro arguments and before comments when these gaps used
tabs.

.SH Caveats             \" noted by XQZ in email of 2014-06-15

That works in Plan 9 troff, as I understand it, but never in groff.  You
need to either replace the tabs with spaces...

.SH Caveats    \" noted by XQZ in email of 2014-06-15

...or eliminate the gap.

.SH Caveats\" noted by XQZ in email of 2014-06-15

Users of the `ds` and `as` requests have _always_ had to eliminate gaps
before comments, with whichever whitespace character, if they sought to
annotate such requests without introducing unwanted spaces or tabs.

$ pdp11 ./v7.simh

PDP-11 simulator V3.8-1
Disabling XQ
@boot
New Boot, known devices are hp ht rk rl rp tm vt
: rl(0,0)rl2unix
mem = 177856
# Restricted rights: Use, duplication, or disclosure
is subject to restrictions stated in your contract with
Western Electric Company, Inc.
Thu Sep 22 23:25:29 EDT 1988

login: dmr
$ nroff <<EOF
> .ds f1 here's \" tab embedded in string contents
> .as f1 a \" space embedded in string contents
> .as f1 surprise!\" no spaces or tabs after exclamation point
> \*(f1foobar
> .pl \n(nlu
> EOF
here's  a surprise!foobar

(V7 Unix nroff doesn't actually truncate the page early.  Huh.)

> It doesn't, because there is not a single line containing a comment
> separated from the argument by a space. Read the examples I gave
> above.

I understand now.  You are bewailing the removal of support for an even
narrower class of practice than I had thought.

I guess I don't perceive this as the crisis that you do, nor as an act
of cruelty to groff's user community, given that they can now manipulate
file names that include spaces, which as noted in the thread is not an
uncommon practice among those who are not conservative Unix veterans
scarred by the fires of shell scripts gone astray.

Were it not for that benefit, I'd probably leave well enough alone.  But
that, plus the obvious upside in reducing the number of rules and
special cases in *roff (or at least groff) grammar one has to keep in
the head, seems worthwhile to me.  And so far, at least, it has drawn no
other protest.

Regards,
Branden

[1] $ git log --oneline | head -n 4
    fa4612e5a [libgroff]: Include <string.h>.
    cbad0724f [doc,man]: Doc additional "char" warning scenario.
    b2f1f69fa groff_mm(7): `Aumt` is a string, not a register.
    0d8867598 [doc,man]: Avoid documentary faceplant.

Attachment: signature.asc
Description: PGP signature

Reply via email to