Re: REVISED groff 1.24.0 readiness status, and release notes

G. Branden Robinson Mon, 18 Nov 2024 09:51:21 -0800

Hi Deri,

Thanks for your reply--I think we are making some headway toward a
meeting of the minds here.

At 2024-11-18T13:52:01+0000, Deri wrote:
> On Sunday, 17 November 2024 22:17:08 GMT G. Branden Robinson wrote:
> > 1.  What are your acceptance criteria for a 1.24.0 release
> >     candidate?
> > 2.  Who do you trust besides yourself to address any of the 3 bugs
> >     listed above?
> > 
> > How do you answer these?
> 
> It looks like comprehension may be an issue here. I understood you
> were asking who I trusted to make a good job of documenting the
> information contained in pdfmark.pdf as it pertained to creating pdfs
> with groff, as I had already answered that as PART of my response
> elsewhere,

But not only that.  I also asked you who you trust to make modifications
of the formatter to implement, for example, transformations of the
argument to `device` requests and `\X` escape sequences to facilitate
encoding of, among other things, PDF bookmark titles that might need to
express non-ASCII character code points.  (To be more precise, in #64484
we're arguing over which non-special-character escape sequences in such
an argument--like `\0` and `\v`--should also undergo some sort of
transformation, and how.)

I furthermore asked you who you trust to make modifications to gropdf.
I'll take this opportunity to clarify: I mean not just "gropdf.pl", but
"pdfmom.pl" for that matter.  I had assumed, for example, that you were
not in any way perturbed by the following commit.

commit b62f0605427caa156f23c81280f345496dd8a5cc
Author: G. Branden Robinson <g.branden.robin...@gmail.com>
Date:   Thu Sep 26 17:26:16 2024 -0500

    [pdfmom]: Handle errors and abnormal exits.

    * src/devices/gropdf/pdfmom.pl: Handle signaled and error exits from
      groff pipeline.  Improve diagnostics.  Store "basename" of executing
      command into `prog` scalar.  Gather the wait status returned by every
      use of `system()`.

      (abort): New subroutine writes a fatal diagnostic message and
      exits with status 1.

      (autopsy): New subroutine decodes a POSIX wait status to
      determine the fate of an unlucky process, and returns it as a
      human-readable string scalar.  (The interesting part is cribbed
      from Perl documentation.)

    Also discard trailing whitespace from lines.  Also update Vim modeline
    to reflect indentation practice in evidence.

    I managed to introduce a SEGV into tbl (that didn't show up with simple
    inputs) in my working copy.  To my dismay, the build just kept right on
    trucking after failing to build "groff-man-pages.pdf", because pdfmom
    ignored the return value of `system()`, concealing errors.  I don't
    think we can accept that; we need to fail fast and fail hard.  If I
    hadn't noticed the piteous "Segmentation fault" error scroll by, I might
    have overlooked my mistake for a long time.

...but now I am less certain.  This is why I keep asking you (for at
least the third time) whether you want an arrangement similar to the one
Peter Schaffter enjoys with mom(7), where the groff developers don't
make commits to "om.tmac" or the files in "momdoc" without prior
arrangement with him.  Despite having put it to you multiple times, you
never answer this question, be it in the affirmative or the negative.
Instead we just roll on with the status quo and occasionally you express
shock or amazement at my ignorance, brazenness, or other character flaw.

That response when you could, for instance, file a Savannah ticket (or
just fire off on an email to this list) much like the following:

"Hey Branden,

In your commit abcdef012345 I think you might have regressed something.

Here's my input: echo 'whatever' | pdfmom > foo.pdf

Before that commit, I got XXX, but now I get YYY.  Was this
intentional?"

I think that's a more efficient mechanism of addressing software
defects than psychologizing or character profiling.  (There is a place
for those, but I don't think it's a good first choice among colleagues.)

Maybe that's just me.

> I have no doubt you would make a great job replacing pdfmark.ms,
> combining stuff from gropdf.1 and comments in pdf.tmac. In fact I know
> you are so thorough it is likely you are likely to find plenty of bugs
> around the seldom used edges of pdf.tmac.
> 
> Which to my mind is an effusive endorsement of your ability, and a
> clear answer to the question of who do I trust to do a good job.

...with documentation.  Your view of my ability to avoid introducing
bugs is considerably less rosy, and your messages seem to me to take on
an angry tone when I do so in a part of the groff code base that is
especially important to you.

Possibly I have a different model of defect genesis than you do.  I
think developers inevitably and pretty much constantly introduce bugs.
Often, these are due to underspecification of the desired behavior,
whether formally or simply in the mind of the programmer.

To a large extent I don't think we can control the introduction of bugs.
(That statement may shock people.  Keep reading.)  I think it's a common
fallacy among journeyman programmers that, through sufficient
discipline, like following every bullet point identified by Steve
McConnell in _Code Complete_ [there are over 100 in total], and/or
achievement of sufficient mastery of a programming language, complete
with an ability to unpromptedly spout citations by section number to an
applicable ISO/IEC/IEEE standards document, they will reach a point
where they never introduce bugs.  They're just that stone cold good,
dawg/brah.

While I would not claim that's impossible--people are capable of amazing
things, sometimes at the cost of development in other areas--I think
it's a bad premise from which to proceed.

People will make mistakes.  That's inevitable, and ultimately we can't
control that.

What we can control, to a significant degree, is the _timing_ of mistake
discovery.  I don't know why I'm the only person since Bertrand to have
written automated tests for groff, but the fact that I've written 224 of
them to date is because I absolutely do not trust myself not to make
mistakes in programming.  And I don't trust anyone else not to, either.

But that doesn't mean I think they're incompetent programmers.  I think
they're human beings modifying complex systems that are almost never as
rigidly specified, at any level all the way down to the hardware (wave
hello to Spectre), as we typically pretend.

How do we know that a change is good, and lacks defects?  Not because it
passes peer review, or because the author is known to be a domain
expert, but because it passes empirical tests such that actual results
match the expected results.

So if I ever upset someone with a bug I've introduced, I'd ask them to
take at least some of the time and energy they would spend excoriating
me for doing it, and channel that into constructing an automated test
exercising the code path in question.  I don't as a rule push a set of
commits that fail our test suite.  The exceptions, as I believe I've
said multiple times before, are (1) when I write a new regression test,
to (try to) illustrate that the test exercises the erroneous code path
that I think it does (and which then passes when the fix is committed),
and (2) when making some sort of infrastructural changes that break the
build and it is impossible, or would obfuscate illuminating details, to
coalesce the changes into one commit.  When either of these happens, I
annotate the commit message appropriately.

Case 1 happens with some frequency, and as yet no one has flown
screeching into this mailing list with a condemnation of groff
developers as incompetent idiots as a result.  Knock wood.

Case 2 has happened once, I think, in the last several thousand commits.
Again, both cases were warned about in the commit message, and neither
has yet drawn reproach.

> If you are unable to do the job an alternative expedient would be be
> to reinstate just pdfmark.ms with a big red notice at the top
> explaining that its retention is an interim measure until something
> more permanent can be produced.

I feel capable.  Just need to find some time.

> > Since copy and pasting a URL would have been even more labor-saving,
> > I infer an additional goal: to air to a broader audience your low
> > assessment of my ability and/or my judgment.
> 
> I'm afraid here my comprehension is struggling. It is not clear to me
> how cut and pasting a URL involves less labour than cut and pasting an
> adjacent block of text, which has the added advantage of being a
> specific answer to the question of who I trusted to do the
> documentation, and an answer to your assertion that I consider you of
> low intellect, merely pasting a URL would have required reading the
> entire post to find the relevent answers.

See above.  I acknowledge that you're willing to give up some plaudits
to my abilities as a documentarian.  And I appreciate those.  But there
is certainly something you find questionable about my abilities or
judgment with respect to code changes, or we wouldn't keep getting into
heated arguments over them.  So, please, accept my invitation to tell me
exactly how I suck as a programmer.  If nothing else it'll go well with
popcorn for the spectators.  But I am prepared to learn from you.

> I do note that you have opened more bugs on this issue, was this to
> reduce the visibility of our discussion? (I don't believe that but you
> did try and put a particular slant on my decision to include the
> relevent part of a bug entry to answer a question you posed here).

As you probably guessed, that was to mark them as to-do items prior to
release (or RC, though I would argue that, consistently with groff's
past release management history, not _all_ documentation changes need be
made before the first RC).

> Perhaps it would help to have the knots defined.
> 
> 1) Should the release happen with a chunk of documentation on how to
> produce pdfs with groff missing?
> 
> Sub-divided into:-
> 
> A) Reinstate a version of pdfmark.pdf, or
> 
> B) A new document.
> 
> My preference is (A) since, as a minimum, we are back where we were
> before Keith's contributions were removed, at least in documenting
> groff's ability to produce pdfs. (B) could be a target for a later
> release.

Is adding material to gropdf(1) covering macros defined in "pdf.tmac"
that aren't already documented there satisfactory to achieve (A)?

> 2) How should "\0" be treated by \X and .device?
> 
> A) Treat the same as "\ " and "\~" i.e. replace with a text space.
> 
> B) Keep as current code, removal and issue a warning (for \X) or no warning 
> and replace with "0" (for .device).
> 
> My preference again is (A), but Branden is against this solution.

Right, and I've explained why.  Tenting my fingers, I'll sneakily hide
that explanation behind the URL

https://savannah.gnu.org/bugs/?64484#comment32

where it shall lie undiscovered for centuries.

I don't endorse sub-option B.2 ["no warning and replace with "0" (for
.device)].  That would, in my opinion, be wrong.  The bug came about
because I misunderstood exactly what I could expect to get back from the
`get_copy()` function when it reads an escape sequence.  Here's the
change I have pending.

diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 8fafa8a0a..25dbd2968 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -6016,14 +6016,22 @@ static void device_request()
   // Null characters can correspond to node types like vmotion_node that
   // are unrepresentable in a device extension command, and got scrubbed
   // by `asciify`.
-  for (int prevc = c; (c != '\0') && (c != '\n') && (c != EOF);
+  for (;
+      (c != '\0') && (c != '\n') && (c != EOF);
        c = get_copy(0 /* nullptr */)) {
-    if (!('\\' == c) && (prevc != '\\'))
+    if (c != '\\')
       mac.append(c);
     else {
       int c1 = get_copy(0 /* nullptr */);
-      if (c1 != '[')
-       mac.append(c1);
+      if (c1 != '[') {
+       if (csprint(c1))
+         warning (WARN_SYNTAX, "ignoring escaped '%1' in device"
+                  " extension command request argument", char(c1));
+       else
+         warning(WARN_SYNTAX, "ignoring escaped control character"
+                 " (code %1) in device extension command request"
+                 " argument", c1);
+      }
       else {
        // Does the input resemble a valid (bracket-form) special
        // character escape sequence?
diff --git a/src/roff/troff/troff.1.man b/src/roff/troff/troff.1.man
index 5a633658e..06c3a230b 100644
--- a/src/roff/troff/troff.1.man
+++ b/src/roff/troff/troff.1.man
@@ -755,6 +755,8 @@ .SH Warnings
 A self-contradictory hyphenation mode was requested;
 an empty or incomplete numeric expression was encountered;
 an operand to a numeric operator was missing;
+a recognized but inapposite escape sequence was used
+in a device extension command request;
 an attempt was made to define a recursive,
 empty,
 or nonsensical character class;

What remains is to write a regression test for it.

I solicit opinions, from anyone, on the foregoing.

Here's the behavior of the `device` request after this change.

$ printf '.device ps:exec A B\\~C\\ D\\0E\\|F\\^G\\h".5m"I\\v".5m"J\n' | 
./build/test-groff -Zww | grep '^x X'
troff:<standard input>:1: warning: ignoring escaped '0' in device extension 
command request argument
troff:<standard input>:1: warning: ignoring escaped 'h' in device extension 
command request argument
troff:<standard input>:1: warning: ignoring escaped 'v' in device extension 
command request argument
x X ps:exec A B\~C\ DE\|F\^G".5m"I".5m"J

`get_copy()` behaves in an interesting way.  `\0` is treated like a
motion (like `\h` and `\v`), but `\~`, `\space`, `\|`, or `|^` aren't.

None of them are documented as "being interpreted even in copy mode",[1]
and for things like strings and macro definitions, they don't seem to
have ever caused problems.  It is attempting to put them out as device
extensions in "grout" that forces us to confront these questions.

One might wonder, what did earlier versions of groff do, before I came
along and ruined everything with my unthinking, spasmodic recklessness?

$ printf 'foobar\n.device ps:exec A B\\~C\\ D\\0E\\|F\\^G\\h".5m"I\\v".5m"J\n' 
| ~/groff-1.22.3/bin/groff -Zww | grep '^x X'
x X ps:exec A B\~C\ D\0E\|F\^G\h".5m"I\v".5m"J
$ printf 'foobar\n.device ps:exec A B\\~C\\ D\\0E\\|F\\^G\\h".5m"I\\v".5m"J\n' 
| ~/groff-1.22.4/bin/groff -Zww | grep '^x X'
x X ps:exec A B\~C\ D\0E\|F\^G\h".5m"I\v".5m"J
$ printf 'foobar\n.device ps:exec A B\\~C\\ D\\0E\\|F\\^G\\h".5m"I\\v".5m"J\n' 
| ~/groff-stable/bin/groff -Zww | grep '^x X'
x X ps:exec A B\~C\ D\0E\|F\^G\h".5m"I\v".5m"J

So there _is_ a difference...older groffs put out `\0`, `\h`, and `\v`
for an input `\0`, `\h`, and `\v`, and with my pending change, nothing
will be put out for them, but you'll get a warning (if enabled).

(Why the leading "foobar"?  <https://savannah.gnu.org/bugs/?65977>.)

But I can change the output difference back, like this:

$ printf '.device ps:exec A B\\~C\\ D\\0E\\|F\\^G\\h".5m"I\\v".5m"J\n' | 
./build/test-groff -Zww | grep '^x X'
troff:<standard input>:1: warning: not interpreting escaped '0' in device 
extension command request argument
troff:<standard input>:1: warning: not interpreting escaped 'h' in device 
extension command request argument
troff:<standard input>:1: warning: not interpreting escaped 'v' in device 
extension command request argument
x X ps:exec A B\~C\ D\0E\|F\^G\h".5m"I\v".5m"J

The alternative diff is simple:

diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 8fafa8a0a..4743b36a7 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -6016,14 +6016,24 @@ static void device_request()
   // Null characters can correspond to node types like vmotion_node that
   // are unrepresentable in a device extension command, and got scrubbed
   // by `asciify`.
-  for (int prevc = c; (c != '\0') && (c != '\n') && (c != EOF);
+  for (;
+      (c != '\0') && (c != '\n') && (c != EOF);
        c = get_copy(0 /* nullptr */)) {
-    if (!('\\' == c) && (prevc != '\\'))
+    if (c != '\\')
       mac.append(c);
     else {
       int c1 = get_copy(0 /* nullptr */);
-      if (c1 != '[')
+      if (c1 != '[') {
+       mac.append(c);
        mac.append(c1);
+       if (csprint(c1))
+         warning (WARN_SYNTAX, "not interpreting escaped '%1' in device"
+                  " extension command request argument", char(c1));
+       else
+         warning(WARN_SYNTAX, "not interpreting escaped control character"
+                 " (code %1) in device extension command request"
+                 " argument", c1);
+      }
       else {
        // Does the input resemble a valid (bracket-form) special
        // character escape sequence?

Which do you prefer?

> (pretty simple decisions, not Gordian at all).

I agree.  So relax--next time I screw up, *no matter where in the tree*,
try using template I presented above:

"Hey Branden,

In your commit abcdef0123 I think you might have regressed something.

Here's my input: echo 'whatever' | pdfmom > foo.pdf

Before that commit, I got XXX, but now I get YYY.  Was this
intentional?"

Regards,
Branden

[1] groff(7):

Escape sequence short reference
     The escape sequences \", \#, \$, \*, \?, \a, \e, \n, \t, \g, \V,
     and \newline are interpreted even in copy mode.

signature.asc
Description: PGP signature

Re: REVISED groff 1.24.0 readiness status, and release notes

Reply via email to