At 2025-01-20T03:27:07+0100, Ingo Schwarze wrote: > > The definitions are generated automatically, so all manpages written > > in mdoc benefit from it. I assume groff mdoc + man-db doesn't > > implement this? > > Not that i know of. It would actually be much harder to implement > in groff than in mandoc because a full roff(7) implementation, by > the basic design of how roff(7) works, lacks a semantic parse tree. > So by the time you get to the output processors, they have no syntactic > information left that they could work on. It's all presentational > at that point.
Having (partially) solved the problem, I don't share your assessment of where the trouble lies. The lack of a semantic parse tree doesn't present significant challenges. That's a simple matter of expanding the semantics of the macro package used to compose the document, whether by leverage of existing macros `MR` or `Xr`, or by adding new ones (to non- man-page packages). Possibly I'll formally propose an `SX` macro for man(7) at some point. It's not a high priority, nor in the near future because the automatic tagging problem is more fundamental and more important; with it, one could automatically generate a hyperlinked multi-level table of contents for any man(7) document, with no kludges. That feature seems, by dint of having seen it done in ad hoc ways by man-to-HTML converters, much more in demand than document-directed ad libitum referencing at a finer-grained level than an entire man(7) or mdoc(7) document. As noted in my reply to onf, the really fiendish difficulty was extending groff's output language so that we could express character code points outside of the Unicode Basic Latin range in device extension commands. (And doing so in a way that didn't break existing features and documents.) That's simply not a problem mandoc(1) has, because it's upside-down relative to a *roff; there's no substrate formatter language upon which man(7) or mdoc(7) macros are built. There are auxiliary challenges still not completely settled, such as the question of what can be permissibly be interpolated to a device extension command. Deri and I are in contention over this question: I'd prefer to prohibit ("noisily", i.e. with a diagnostic message, probably a warning) any node type that isn't a character or a word space. Deri would prefer to, in more cases, do what the user probably meant, particularly with respect to horizontal motion nodes. What that means in practice is that situations like one I exhibited in my reply to onf: .ds AUTHOR Frank \uand\d Estelle Costanza\" .pdfinfo \Author \*[AUTHOR] ...would continue to produce warnings, because vertical motions are not sensibly representable in device extension commands. (We're not actually formatting text, so what would they mean?) This point is noteworthy because groff has long carried an exhibit of this very thing. One of Peter Schaffter's mom(7) example documents embeds a vertical motion in the argument to an `AUTHOR` macro, and has provoked this gripe from the formatter I guess ever since it landed (with a brief respite by default in the groff 1.23.0 release because nobody who understood what the diagnostics meant was willing to explain them to anyone else, privately or on the mailing list--and so I made it shut up--but now I know...I think). My plan for resolving _that_ problem is to introduce a string sanitizer, probably in a new macro file "string.tmac", which people can use for common operations on strings. And that in turn is predicated on developing a new GNU troff request to iterate through a string. "[troff] string iteration handles escape sequences inconsistently (want `for` request)": https://savannah.gnu.org/bugs/?62264 ...and also on a new conditional expression that can tell when an element of a string is a node. I don't think I've filed a ticket for that one yet. (Right now, extracting substrings of strings containing nodes is undefined behavior in the groff language.) Little or none of this is anything mandoc will ever have to care about. Regards, Branden
signature.asc
Description: PGP signature