Hi Ingo, At 2025-09-25T02:02:24+0200, Ingo Schwarze wrote: > On the other hand, for mdoc(7), the situation is much worse than > for man(7) in so far as the macro order .Dd .Dt .Os used to be > mere convention, and any other order of these three macros used > to be equally valid. Groff-1.23 utterly broke that and now always > starts a new manual page at .Dd, so every manual page with a different > macro order is now totally broken with groff.
I broke it, and I broke it for a reason. When formatting for paginated output devices (anything that isn't a terminal or HTML--the only output formats _mandoc_ natively supports[0]), when the formatter starts a new _man_(7) or _mdoc_(7) document, it must break the page. What happens at a page break? The page footer gets populated. What populates the page footer? In _man_, various arguments to the `TH` call populate it. In _mdoc_, this same information is spread over multiple macro calls. Before the macro package can break the page and write the footer, the data that populate the page footer must be in a well-defined state. In other words, you don't want some of it to come from document A and other bits of it to come from document B. If `Dd`, `Dt`, and `Os` can appear in arbitrary order, you risk producing an incorrect page footer, sticking some of document n+1's data at the bottom of the last page of document n. I know this because I saw it happen. Possibly I could have added support for some kind of transitional state to _groff_'s _mdoc_ package, and deferred the page break until all 3 macros had appeared regardless of ordering, but that would have added complicated logic. My impression is that you're not a fan of complicated logic, as a rule. In my opinion, the segregation of `Dd`, `Dt`, and `Os` was a blunder in _mdoc_'s design for precisely the reason above. The siren call of "semantic markup" was so loud that, in this case, it drowned out the murmur of practical typesetting considerations. _mdoc_ should have had a `Th`. There was no reason to spread this information over multiple calls; the macros are not "parsed" or "callable". And as we've seen, the semantics of `Os` are readily distinguishable from the mnemonic its name suggestively dangles. Furthermore, _mdoc_ documents that deviate from the canonical/ (conventional?) order seem rare. In a FreeBSD bug report raising this issue,[1] Wolfram Schneider identified only 15 pages in the base/core/whatever system (all from 1 package, I think: krb5), and 371 out of about 15,000 in the ports collection. That's 2.4% of all _mdoc_ pages in the ports. (Since the ports will have a lot of _man_ pages--I'll wager _significantly_ more than they do _mdoc_ pages--the proportion of affected pages is, if not negligible, then nearly so.[3]) If someone does actually regard this as a defect in _groff_, they can say so. I have not yet seen anyone make this claim. > > I find recent groff(1) being quite able to handle multi-.TH pages > > Branden has invested massive effort into making it kind-of work, It should _totally_ work. I have confidence in my automated tests. I urge you to file bug reports if you identify defects. > in fact so massive that i have totally lost track of what is going on. Have you read the code? Where would explanatory comments be helpful? In my assessment, anything we would have to do to unwind inline font family or type size changes in _mdoc_ documents is going to be more intrusive and complex than support for "PDF booking". > If i remember correctly, he has invented lots of new registers > along with lots of novel rules how to use them to make it work, > wrapping himself into elaborate nets of overengineering and > resulting in long discussions in various bug tracker tickets > about how it is all supposed to work. I refrained from reading > most of that - too hard to understand and not really relevant for > any practical purpose that i care about. Defect reports have been made in the past and, when confirmed, they often lead to discussion. Is that unusual in your experience? I'm open to proposed refactorings that keep all the tests passing. If a "simplification" or "right-sizing" of the "overengineering" causes tests to fail, then the simplification is illusory--assuming you accept the premise that formatting a collection of man pages as a PDF document, or in printed and bound form, is not a crazy thing to do. What it is, is outside of _mandoc_(1)'s mission. But as you've quite recently noted, it's not outside of _groff_'s.[2] To address the gripe you raise above about `Dd`, `Dt`, and `Os` would require--guess what?--more registers (and/or strings) and more complexity. Is that what you want? Regards, Branden [0] As I understand it, _mandoc_(1)'s PDF support comes from using an external tool to generate it from HTML. That approach has significant limitations from a typesetting perspective. [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274132 [2] https://lists.gnu.org/archive/html/bug-groff/2025-09/msg00122.html [3] It's possible Wolfram counted _all_ man pages in the ports, regardless of macro language, in which case 2.4% is likely an accurate figure. He didn't share his method, and I don't have an easy way to crawl the entire FreeBSD ports collection. I once started to download a Git repository of it. I interrupted it because it looked like it was going to take all day, and too much disk space.
signature.asc
Description: PGP signature