Hi Ingo,

At 2025-09-25T02:02:24+0200, Ingo Schwarze wrote:
> On the other hand, for mdoc(7), the situation is much worse than
> for man(7) in so far as the macro order .Dd .Dt .Os used to be
> mere convention, and any other order of these three macros used
> to be equally valid.  Groff-1.23 utterly broke that and now always
> starts a new manual page at .Dd, so every manual page with a different
> macro order is now totally broken with groff.

I broke it, and I broke it for a reason.  When formatting for paginated
output devices (anything that isn't a terminal or HTML--the only output
formats _mandoc_ natively supports[0]), when the formatter starts a new
_man_(7) or _mdoc_(7) document, it must break the page.

What happens at a page break?  The page footer gets populated.

What populates the page footer?

In _man_, various arguments to the `TH` call populate it.

In _mdoc_, this same information is spread over multiple macro calls.

Before the macro package can break the page and write the footer, the
data that populate the page footer must be in a well-defined state.
In other words, you don't want some of it to come from document A and
other bits of it to come from document B.

If `Dd`, `Dt`, and `Os` can appear in arbitrary order, you risk
producing an incorrect page footer, sticking some of document n+1's data
at the bottom of the last page of document n.  I know this because I saw
it happen.

Possibly I could have added support for some kind of transitional state
to _groff_'s _mdoc_ package, and deferred the page break until all 3
macros had appeared regardless of ordering, but that would have added
complicated logic.  My impression is that you're not a fan of
complicated logic, as a rule.

In my opinion, the segregation of `Dd`, `Dt`, and `Os` was a blunder in
_mdoc_'s design for precisely the reason above.  The siren call of
"semantic markup" was so loud that, in this case, it drowned out the
murmur of practical typesetting considerations.  _mdoc_ should have had
a `Th`.  There was no reason to spread this information over multiple
calls; the macros are not "parsed" or "callable".

And as we've seen, the semantics of `Os` are readily distinguishable
from the mnemonic its name suggestively dangles.

Furthermore, _mdoc_ documents that deviate from the canonical/
(conventional?) order seem rare.  In a FreeBSD bug report raising this
issue,[1] Wolfram Schneider identified only 15 pages in the
base/core/whatever system (all from 1 package, I think: krb5), and 371
out of about 15,000 in the ports collection.  That's 2.4% of all _mdoc_
pages in the ports.  (Since the ports will have a lot of _man_
pages--I'll wager _significantly_ more than they do _mdoc_ pages--the
proportion of affected pages is, if not negligible, then nearly so.[3])

If someone does actually regard this as a defect in _groff_, they can
say so.  I have not yet seen anyone make this claim.

> > I find recent groff(1) being quite able to handle multi-.TH pages
> 
> Branden has invested massive effort into making it kind-of work,

It should _totally_ work.  I have confidence in my automated tests.  I
urge you to file bug reports if you identify defects.

> in fact so massive that i have totally lost track of what is going on.

Have you read the code?  Where would explanatory comments be helpful?
In my assessment, anything we would have to do to unwind inline font
family or type size changes in _mdoc_ documents is going to be more
intrusive and complex than support for "PDF booking".

> If i remember correctly, he has invented lots of new registers
> along with lots of novel rules how to use them to make it work,
> wrapping himself into elaborate nets of overengineering and
> resulting in long discussions in various bug tracker tickets
> about how it is all supposed to work.  I refrained from reading
> most of that - too hard to understand and not really relevant for
> any practical purpose that i care about.

Defect reports have been made in the past and, when confirmed, they
often lead to discussion.  Is that unusual in your experience?

I'm open to proposed refactorings that keep all the tests passing.  If a
"simplification" or "right-sizing" of the "overengineering" causes tests
to fail, then the simplification is illusory--assuming you accept the
premise that formatting a collection of man pages as a PDF document, or
in printed and bound form, is not a crazy thing to do.

What it is, is outside of _mandoc_(1)'s mission.  But as you've quite
recently noted, it's not outside of _groff_'s.[2]

To address the gripe you raise above about `Dd`, `Dt`, and `Os` would
require--guess what?--more registers (and/or strings) and more
complexity.  Is that what you want?

Regards,
Branden

[0] As I understand it, _mandoc_(1)'s PDF support comes from using an
    external tool to generate it from HTML.  That approach has
    significant limitations from a typesetting perspective.

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274132
[2] https://lists.gnu.org/archive/html/bug-groff/2025-09/msg00122.html

[3] It's possible Wolfram counted _all_ man pages in the ports,
    regardless of macro language, in which case 2.4% is likely an
    accurate figure.  He didn't share his method, and I don't have an
    easy way to crawl the entire FreeBSD ports collection.  I once
    started to download a Git repository of it.  I interrupted it
    because it looked like it was going to take all day, and too much
    disk space.

Attachment: signature.asc
Description: PGP signature

Reply via email to