Hi Deri,

At 2026-02-22T19:13:54+0000, Deri wrote:
> On Sunday, 22 February 2026 07:18:59 GMT G. Branden Robinson wrote:
> > At 2026-02-21T13:50:16+0000, Deri wrote:
> > > This is a strange one!! The error occurs if you don't have the
> > > file doc/ gnu.eps (because PSPIC can't find it). doc.am makes it
> > > (from doc/gnu.xpm) but in "out-of-tree" builds it ends up in
> > > build/doc/gnu.eps, and not found.
> > > 
> > > The regression was introduced by commit e9da162af80 (Feb 7, 2026),
> > > reverting fixes it.
> >
> > I should point out that this is an incorrect, or at best incomplete,
> > analysis.
> 
> I'm interested to know which bits of the analysis was incorrect.

I did also say "or at best incomplete", which I maintain it was, but I
guess you're contesting that, too.  Or maybe not--see below.

> Given the starting point was:-
> 
>   GROFF    doc/webpage.html
> pre-grohtml: fatal error: 'pre-grohtml' exited with status 4; re-run
> with a different output driver to see diagnostic messages
> 
> And your initial analysis was to look at pre-grohtml and decide it
> could not happen.

When a program fails with a fatal error, why would it not make sense to
look first at the complaining program?

Incidentally it appears to me that this diagnostic:

src/preproc/html/pre-html.cpp:1866:      fatal("'%1' exited with status %2; 
re-run with a different output"

...may be wrongly composed.  pre-grohtml doesn't know its own exit
status at this point--`fatal()` determines that.  '%1' should probably
be populated with something else, but at the moment I'm not sure what.
troff(1) itself shouldn't ever exit with status 4.  groff(1) can, of
course.

I strongly dislike grohtml's strategy of pre-rendering the entire input
as PostScript before re-rendering it as HTML.  It's complex, it's slow,
and its complexity drove the concealment of much information that is
necessary to troubleshoot failure scenarios like this one.

Every time I have to deal with grohtml screwing up, whether due to user
error or its own bugs, this concealment of information frustrates
troubleshooting and burns up my patience.

This problem can be solved, but it's going to require a little work on
eqn or grohtml's interface to it (basically, require XHTML+MathML), and
a lot more on pic and tbl.

pic: https://savannah.gnu.org/bugs/?62890
tbl: https://savannah.gnu.org/bugs/?60052

> So was my analysis that PSPIC was causing the error incorrect?

It seems so.  Your proposed reversion changed nothing in
"tmac/pspic.tmac", and the fix I think to be better[1] doesn't either.

> Was doc.am building gnu.eps in the destdir - is this not true?

Here's another terminological problem.

Literally, no.  doc.am (or make(1)) doesn't "build gnu.eps in the
destdir".  `DESTDIR`, in a widely adopted Makefile convention that
Automake also employs,[2] refers to _the directory to which files are
installed_.  Your term, "destdir", is either undefined or irrelevant to
what happens when you run a groff "make" with its default target (or
ask for "doc/webpage.html" specifically), because neither of those
involve performing _installation_.

We've had much grief over the years with the "doc.am" file arising from
confusion of (1) the source tree, (2) the build tree, and (3) the
installation directory (`DESTDIR`), which can all be distinct.  And, if
we follow the path of glibc and binutils as recently pointed out by
Collin Funk, will become distinct for _all_ supported build scenarios.

> Was grohtml being instructed to look in the srcdir - where it should
> not exist in an out-of-tree build - but obviously did when you were
> testing.

If by this you mean "was `-I $(doc_srcdir)` being passed to `groff
-Thtml`", then yes, but that's an invariant.  It was true in 2014, was
true before my commit on 7 February, was true after, and is still true
in my working copy, which has the fix I believe to be "correct".[1]

> Was it working before Feb 7th commit, or is that incorrect.

Yes.  And it was also working _after_ 7 February, for me and for several
of Bruno Haible's builds.  Bjarni's the only person who reported a
problem, and unfortunately he has a track record of reporting against
GNU groff problems that affect only his private fork of it.[3]

> I do not comment on whether my analysis is incomplete, it was
> obviously sufficient to steer you onto the right track to fix the
> problem

If I still had driving to do after having been "steered", then you are
conceding that it was incomplete.

> - particularly because, at the time, you were claiming you did not
> think it was a problem.

Where did I say that?  Here's what I actually wrote:

>>> I can't account for this.  No logic in pre-grohtml explicitly exits
>>> with status 4 under any cirumstances.
>>> 
>>> There is a curious wrapper macro and function in
>>> "src/preproc/html/pushback.cpp":
[snip]
>>> For one thing, line 103 should be unreachable.
>>> 
>>> But I suppose this doesn't account for the problem either, because
>>> the only caller of "localexit" is shown above (the function could
>>> have been declared `static`); the function is only ever passed the
>>> value "1".
>>> 
>>> Unless more people can reproduce this, perhaps with compiler options
>>> that less closely resemble Cyril Figgis laying down suppressing
>>> fire,[1] I don't regard your report as gating the groff 1.24.0
>>> release.
>>> 
>>> Regards,
>>> Branden

https://lists.gnu.org/archive/html/groff/2026-02/msg00085.html

Where's "I don't think this is a problem" in the foregoing?

If the existence of bugs in groff that aren't gates for the 1.24.0
release is equivalent to "not being a problem", then Savannah lists 418
tickets I can close right now.

And as it happened, "more people" _could_ reproduce this--like you, and
like me, once "steered" by you.  So the sufficient condition for gating
the 1.24.0 release was met, per criteria I established before you wrote
your first email to this list on the matter.

> You have added to my analysis the reasons why you made a mistake when
> changing doc.am to build webpage.html,

Again, let's be precise with language.  My change didn't alter "doc.am"
_to_ build "webpage.html", but _how_ that Automake file built
"webpage.html".  The "how" made most of the difference; a feature of Git
that I didn't understand involving its "status" subcommand (that I now
think I've figured out how to incant as I desire[4]) made the rest.

> maybe that's why you considered my analysis was incomplete.

I think, as shown above, there are multiple premises each sufficient to
reach that conclusion.

> I would point out that your eventual fix was to add -I $(doc_builddir)
> to the make rule for webpage.html,

Right.

> and the part of my analysis you did not bother to quote when you
> accused my analysis of being incorrect:-
> 
> "Exactly the problem, gnu.eps is created in $(doc_builddir) but
> troff's -I looks in $(doc_srcdir). Of course, in-tree, these are both
> the same, so it works."

I found this confusing (and still do).  "but troff's -I looks in
$(doc_srcdir)" is, as I noted above, an invariant across the entire
history of the "doc.am" file, even with respect to a commit I haven't
pushed yet.

So saying that a thing I _don't_ need to change is "exactly the problem"
when I need to be changing some _other_ thing--even a lexically adjacent
thing--was not, at the time, illuminating to me.

> I know I'm just a quadriplegic spastic so can't possibly operate in
> the higher realms of intellectual thought, but give me a little credit
> if I strike it lucky. (Or is it luck). :-)

I think neglecting to precisely use language where doing so is necessary
to reach mutual understanding accounts for a much greater proportion of
our failures of communication than random variables or CNS differences.

> > And the problem that _should_ have been revealed was masked on my
> > system because of the lingering (and all but hidden by ".gitignore")
> > "doc/gnu.eps" file in my _source_ directory, which was (roughly)
> > your analysis.  But that was only one piece of the puzzle.
> 
> Part of it, but only in so far as explaining why you kept claiming it
> was not a problem.

"Kept claiming".  First, you invented me saying "there wasn't a problem"
once.  Now, it's multiple times.  Okay, I just identified an additional
first-order factor contributing to our failures of communication.  Try
sticking to the words I actually write.

> The real analysis was that PSPIC was the culprit,

Completely disagree.  What about the file "tmac/pspic.tmac", which
defines the `PSPIC` macro, would you change to resolve the defect Bjarni
reported?  Alternatively, what would change in "webpage.ms", which
contains the `PSPIC` macro call that ultimately failed?

The culprit was a long-standing latent bug involving insufficient
preparation of a groff(1) command line in a Makefile.[5]

> that gnu.eps was being created in the destdir but the make rule was
> directing groff to the srcdir (which is why I predicted you had a
> "rogue" copy in your srcdir),

That was a good catch.  Unfortunately my lack of mastery of "git status"
concealed the file from me for a while longer.

> and even pointed you at the commit to examine so that you could look
> at the changes which caused the issue.

Not _caused_ the issue.  _Exposed the latent issue_.

> The analysis you added was just why the previous version worked, I did
> not think that was necessary to fix the problem.

I'm not excited to play the "experienced software engineer" card on you,
but it _frequently_ happens in the field that when a bug has undergone a
root-cause analysis, its longevity (or persistence) is discovered to be
much greater than is consistent with the timing of report(s) that led to
its investigation.  People wind up saying things like...

"Wait a minute--how did this _ever_ work?"

Sadly it's often the case that the pressure is on to just close the
books, resolve the ticket, ship a hotfix to a customer, or similar, and
get back to activities recognized (or decreed) by managers as revenue-
producing ("number go up!").  The pressure is on to forget at once that
chilling moment of horror arising from the realization that the codebase
houses further unexpected mysteries.  But if you are afforded the time
by management--or seize the initiative--to indulge investigation of such
an alarming question, it can happen that you find other defects in the
code--bombs waiting to go off.

"Move fast, break stuff" is one popular mantra of software development.

"Take your time, defuse bombs" is, regrettably, a less popular one.

> For completeness you should mention that the -I flags to groff are
> passed to pre-grohtml which uses them when it calls groff -Tps -I ...
> so it is grops which actually locates and uses the gnu.eps file.

That's fair, to a point.  My emails are long enough, and digressing into
the minutia of PostScript resource inclusion seemed beside the point.
It doesn't seem to matter whether an EPS included by the `PSPIC` macro
is somehow encoded for inclusion in GNU troff's "grout" output, or
simply referred to on the file system via a device extension command.
(The latter is what is actually the case.)

> The text in troff.1 is a little misleading since it assigns the -I
> directory search for the \X'ps: import ...' to troff, but troff does
> nothing with this, it is the fact -I is passed to grops which allows
> the import to happen. This is different from psbb, so and soquiet, but
> occurs in the same sentence.

That's a reasonable point.  Want to pitch a recast?  :)

> First, I never suggested you should revert, just that reverting fixed
> the problem,

I wouldn't say that much, especially now that the root cause is known.

Reverting the change _re-concealed a latent problem_.

It's always good to have some small thing one can toggle to make a bug
manifest and then go away again, but that doesn't mean the thing you're
toggling _is_ the problem.

A sketch of next steps from there is:

1.  reduce the size of the toggled change to the minimum syntactical
    alternation that changes the behavior; and
2.  understand why that change alters the behavior.

If the toggling change alters observable behavior _but shouldn't_, you
know that you haven't yet found the root cause of the problem.

David Agans explores this in his humbly titled 2002 book, _Debugging_.

> thus giving you a big fat clue where the problem was located.

Right.  But not, as noted, a "complete analysis".

> Now, what I mean by a regression.
> 
> A change with the unintended consequence of altering previous
> behaviour, usually in some negative way.
> 
> With this definition, I class this as a regression (unless your
> intention was to break the build!!). I have no problem with changes to
> the build process, but if the change causes stuff which used to work
> to no longer work. Intention is derived from the stated purpose of a
> commit, its no good claiming a different intention post commit when a
> problem is discovered.

So...you think commit e9da162af8 should still be reverted?

> > If you meant to suggest that [I] made the change in commit
> > e9da162af8 with insufficient diligence as to its risks or
> > consequences, then I ask you to think again.  Again, see the commit
> > log message.
> 
> Not at all, I'm suggesting it was unintentional, a mistake due to
> inadequate analysis of why the previous code was working,

The previous code shouldn't _ever_ have worked, and that it did (for me)
was, as you correctly diagnosed, because there was leftover crap in my
working copy that should not have been there and that my habits of Git
usage did not reveal to me.

How did the crap get there in the first place?  I don't know for sure,
and maybe can't, but I'd put my money on doc/doc.am being somewhat
haphazardly assembled over the years.  See also Ingo Schwarze's lengthy
screed in the "NEWS" file accompanying commit 3805d2a0e4, 12 April 2022.

> and that makes it a regression in my universe.

That definition is unsatisfactory to me.  Did _the build_ regress in
some way?  Yes.  Was _that commit_ of itself a regression?  No.  Did
that commit expose a latent bug?  Yes.  Did that bug warrant
investigation upon being reproduced by others?  HELL yes.

By affixing the term "regression" to a commit that _doesn't constitute
the introduction of a defect_, we distract our attention from the
necessary work of locating where a defect truly is.  That distraction in
turn neglects to exercise our ability to recognize defects.

In NetHack parlance, indulging distraction abuses wisdom.

Distraction is fine--if all you want is to move fast and break stuff.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff/2026-02/msg00110.html
[2] 
https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/Makefile.am?h=1.24.0.rc4#n268

[3] 
https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&func=browse&set=custom&msort=0&report_id=225&advsrch=0&bug_id=&submitted_by=0&status_id=3&severity=0&category_id=0&assigned_to=0&summary=bjarnigroff&bug_group_id=0&resolution_id=0&plan_release_id=0&history_search=0&history_field=0&history_event=modified&history_date_dayfd=22&history_date_monthfd=2&history_date_yearfd=2026&max_rows=50&spamscore=5&boxoptionwanted=1#options

[4] I wanted "git status -uall" (strictly: combined with a filter), not
    "git status -uno".

[5] Specifically, the latent bug crept in in commit c81db845da, 6 April
    2022, by me, over a year before the groff 1.23.0 release

    The commit log is revealing:

      (doc/webpage.html): There is no need to look in `doc_builddir` for
      file inclusions, since that is the current working directory when
      "webpage.ms" is processed.

    ...which was true.  I surrendered a hostage to fortune by not just
    scotching the careless "cd" business at the time.  I had less
    courage back then.  (Possibly doing so was infeasible at the time,
    demanding other feature development that took place later; I don't
    know, and have no plans to find out.)

    Applying your definitions, I introduced no defect here, latent or
    otherwise, because no regression occurred consequent to that commit.
    Many people did groff builds between 7 April 2022 and 7 February
    2026 without the build failing as Bjarni observed.

    Per my _own_ definitions, I made a mistake then nevertheless.

Attachment: signature.asc
Description: PGP signature

Reply via email to