Hi Branden, On Fri, Sep 19, 2025 at 09:49:45PM -0500, G. Branden Robinson wrote: > Hi Alex, > > At 2025-09-11T09:29:23-0500, G. Branden Robinson wrote: > > At 2025-09-11T10:59:19+0200, Alejandro Colomar wrote: > > > I'm trying hard to have reproducible builds, so that I can verify > > > that my build system produces the same exact thing as long as the > > > tools used with it are the same (or a reasonably similar version). > [...] > > > Is groff(1) just random in some sense? Would it be possible to > > > remove that randomness from groff(1)? > > > > I'll need to look at what data structure is being used to house the > > list of file names that get dumped into GNU troff(1)'s output for HTML > > devices. I suspect that feature was put in as a grohtml(1) debugging > > aid, as there's nothing about it that necessarily couples it to the > > HTML format. Output for any target device could dump into its grout > > the list of input files that were read during formatting. > > I can't account for this behavior.
Have you been able to reproduce it at least? Maybe I should try to find a simple reproducer, which could help debug it. > No container-style data structure is > used; file names are written out to the grout stream synchronously as a > `file_iterator` opens them. I cannot think of a mechanism by which the > formatter could be opening files for reading in a nondeterministic order > given a consistent input. > > I'd provide links to groff source via cgit.git.savannah.gnu.org, but > once again the site is under AI DDoS attack and it's nonresponsive or > unbearably slow. I got a persistent AI DDoS on my server last month. They started, and wouldn't stop. I decided to switch ports (HTTP is served on 443, and HTTPS on 80). That broke the links to my website in a way that humans can understand with a simple comment near the link, but stupid scripts can't follow. The next step would be this anti-AI page, but I didn't want to learn how to install and set up that now. But so far, switching ports works. I still get the attack (the logs are full of 4xx from crawler requests), but they're not enough to slow the server down. > If you have a checkout, the functions you want are the aforementioned > `file_iterator`'s constructor in "src/roff/troff/input.cpp", and > `output_file::really_put_filename()` in "src/roff/troff/node.cpp".) > > While constructors could be called in a nondeterministic order to > populate global objects of their type at application startup, that > doesn't happen here. The `file_iterator` type is private to > "input.cpp", and there are no globals of that type. > > The only call sites of `file_iterator`'s constructor are: > > input_stack::next_file() // called by `next_file()`, .nx handler > do_source() // .so and .soquiet backend > pipe_source_request() // .pso handler > process_macro_package_argument() // `-m` command-line option handler > process_startup_file() // called by main on file name literals > do_macro_source() // .mso and .msoquiet backend > process_input_file() // called by main() on argv[] elements > > > It's possible that the data structure is effectively an unordered map, > > and so is subject to the host system's stochastic and history-dependent > > dynamic memory allocations. However, I'm not strongly confident about > > that because the output doesn't seem quite random _enough_. > > > > Anyway, one shouldn't theorize ahead of facts, so I'll check out the > > data structure and see what there is to see. > > Yeah, I got this totally wrong. > > Worse still, I'm stumped. > > Regards, > Branden Have a lovely day! Alex -- <https://www.alejandro-colomar.es> Use port 80 (that is, <...:80/>).
signature.asc
Description: PGP signature