Hi Branden,

On Fri, Sep 19, 2025 at 09:49:45PM -0500, G. Branden Robinson wrote:
> Hi Alex,
> 
> At 2025-09-11T09:29:23-0500, G. Branden Robinson wrote:
> > At 2025-09-11T10:59:19+0200, Alejandro Colomar wrote:
> > > I'm trying hard to have reproducible builds, so that I can verify
> > > that my build system produces the same exact thing as long as the
> > > tools used with it are the same (or a reasonably similar version).
> [...]
> > > Is groff(1) just random in some sense?  Would it be possible to
> > > remove that randomness from groff(1)?
> > 
> > I'll need to look at what data structure is being used to house the
> > list of file names that get dumped into GNU troff(1)'s output for HTML
> > devices.  I suspect that feature was put in as a grohtml(1) debugging
> > aid, as there's nothing about it that necessarily couples it to the
> > HTML format.  Output for any target device could dump into its grout
> > the list of input files that were read during formatting.
> 
> I can't account for this behavior.

Have you been able to reproduce it at least?  Maybe I should try to
find a simple reproducer, which could help debug it.

>  No container-style data structure is
> used; file names are written out to the grout stream synchronously as a
> `file_iterator` opens them.  I cannot think of a mechanism by which the
> formatter could be opening files for reading in a nondeterministic order
> given a consistent input.
> 
> I'd provide links to groff source via cgit.git.savannah.gnu.org, but
> once again the site is under AI DDoS attack and it's nonresponsive or
> unbearably slow.

I got a persistent AI DDoS on my server last month.  They started, and
wouldn't stop.  I decided to switch ports (HTTP is served on 443, and
HTTPS on 80).  That broke the links to my website in a way that humans
can understand with a simple comment near the link, but stupid scripts
can't follow.  The next step would be this anti-AI page, but I didn't
want to learn how to install and set up that now.  But so far, switching
ports works.  I still get the attack (the logs are full of 4xx from
crawler requests), but they're not enough to slow the server down.

> If you have a checkout, the functions you want are the aforementioned
> `file_iterator`'s constructor in "src/roff/troff/input.cpp", and
> `output_file::really_put_filename()` in "src/roff/troff/node.cpp".)
> 
> While constructors could be called in a nondeterministic order to
> populate global objects of their type at application startup, that
> doesn't happen here.  The `file_iterator` type is private to
> "input.cpp", and there are no globals of that type.
> 
> The only call sites of `file_iterator`'s constructor are:
> 
> input_stack::next_file() // called by `next_file()`, .nx handler
> do_source() // .so and .soquiet backend
> pipe_source_request() // .pso handler
> process_macro_package_argument() // `-m` command-line option handler
> process_startup_file() // called by main on file name literals
> do_macro_source() // .mso and .msoquiet backend
> process_input_file() // called by main() on argv[] elements
> 
> > It's possible that the data structure is effectively an unordered map,
> > and so is subject to the host system's stochastic and history-dependent
> > dynamic memory allocations.  However, I'm not strongly confident about
> > that because the output doesn't seem quite random _enough_.
> > 
> > Anyway, one shouldn't theorize ahead of facts, so I'll check out the
> > data structure and see what there is to see.
> 
> Yeah, I got this totally wrong.
> 
> Worse still, I'm stumped.
> 
> Regards,
> Branden

Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

Attachment: signature.asc
Description: PGP signature

Reply via email to