At 2025-01-20T03:27:07+0100, Ingo Schwarze wrote:
> > The definitions are generated automatically, so all manpages written
> > in mdoc benefit from it.  I assume groff mdoc + man-db doesn't
> > implement this?
> 
> Not that i know of.  It would actually be much harder to implement
> in groff than in mandoc because a full roff(7) implementation, by
> the basic design of how roff(7) works, lacks a semantic parse tree.
> So by the time you get to the output processors, they have no syntactic
> information left that they could work on.  It's all presentational
> at that point.

Having (partially) solved the problem, I don't share your assessment of
where the trouble lies.  The lack of a semantic parse tree doesn't
present significant challenges.  That's a simple matter of expanding the
semantics of the macro package used to compose the document, whether by
leverage of existing macros `MR` or `Xr`, or by adding new ones (to non-
man-page packages).

Possibly I'll formally propose an `SX` macro for man(7) at some point.
It's not a high priority, nor in the near future because the automatic
tagging problem is more fundamental and more important; with it, one
could automatically generate a hyperlinked multi-level table of contents
for any man(7) document, with no kludges.  That feature seems, by dint
of having seen it done in ad hoc ways by man-to-HTML converters, much
more in demand than document-directed ad libitum referencing at a
finer-grained level than an entire man(7) or mdoc(7) document.

As noted in my reply to onf, the really fiendish difficulty was
extending groff's output language so that we could express character
code points outside of the Unicode Basic Latin range in device extension
commands.  (And doing so in a way that didn't break existing features
and documents.)  That's simply not a problem mandoc(1) has, because it's
upside-down relative to a *roff; there's no substrate formatter language
upon which man(7) or mdoc(7) macros are built.

There are auxiliary challenges still not completely settled, such as the
question of what can be permissibly be interpolated to a device
extension command.  Deri and I are in contention over this question: I'd
prefer to prohibit ("noisily", i.e. with a diagnostic message, probably
a warning) any node type that isn't a character or a word space.  Deri
would prefer to, in more cases, do what the user probably meant,
particularly with respect to horizontal motion nodes.

What that means in practice is that situations like one I exhibited in
my reply to onf:

    .ds AUTHOR Frank \uand\d Estelle Costanza\"
    .pdfinfo \Author \*[AUTHOR]

...would continue to produce warnings, because vertical motions are not
sensibly representable in device extension commands.  (We're not
actually formatting text, so what would they mean?)

This point is noteworthy because groff has long carried an exhibit of
this very thing.  One of Peter Schaffter's mom(7) example documents
embeds a vertical motion in the argument to an `AUTHOR` macro, and has
provoked this gripe from the formatter I guess ever since it landed
(with a brief respite by default in the groff 1.23.0 release because
nobody who understood what the diagnostics meant was willing to explain
them to anyone else, privately or on the mailing list--and so I made it
shut up--but now I know...I think).

My plan for resolving _that_ problem is to introduce a string sanitizer,
probably in a new macro file "string.tmac", which people can use for
common operations on strings.

And that in turn is predicated on developing a new GNU troff request to
iterate through a string.

"[troff] string iteration handles escape sequences inconsistently (want
`for` request)": https://savannah.gnu.org/bugs/?62264

...and also on a new conditional expression that can tell when an
element of a string is a node.  I don't think I've filed a ticket for
that one yet.  (Right now, extracting substrings of strings containing
nodes is undefined behavior in the groff language.)

Little or none of this is anything mandoc will ever have to care about.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature

Reply via email to