Re: [Groff] Manpages, groff, and the browser.

Eric S. Raymond Sun, 16 Mar 2014 18:06:07 -0700

Kristaps Dzonsons <krist...@bsd.lv>:
> Browsers are confusing because HTML doesn't play with
> character-driven media.  And roff(7), into which groff(1) translates
> man(7) and mdoc(7), is (significantly?) character-driven.  We hack
> around this by converting -Tascii output into <pre>-wrapped
> documents.  But that's not really HTML and makes browsers cry.
> 
> One solution is to disregard roff(7) and regard only man(7) and
> mdoc(7).  mandoc(1) does this.  It gets away with it because it's
> built specifically (and in a way, dumbly) just for man(7) and
> mdoc(7) and just enough roff(7), tbl(7), etc.


This is the same approach that doclifter takes.  I think it's
the only practical one.

Ignore for the moment the fact that it goes through DocBook on the way
to HTML - DocBook isn't actually the point, it's a way to enforce
separation of concerns and delegate actual HTML or PostScript
gneration to specially tuned back ends.

> It was suggested that groff(1) be taught a subset of roff(7) that
> can map into a tree structure, then compile that further into HTML.
> If this is possible (it sounds hard and/or awesome), and if somebody
> pulls it off and modifies the existing macros to use the "clean"
> roff(7), then groff(1) would map beautifully into HTML and not care
> whether its input is mom(7), mdoc(7), or man(7) so long as the
> underlying tmac file has been properly treated.  That's a lot of
> work: identifying the relevant roff(7) macros, then teaching
> groff(1) to extract a syntax tree from those macros, then doing
> something with that syntax tree, then modifying the macro packages.
> But it sounds, to my uninformed ear, possible.

It sounds, to my informed ear, effectively impossible.

I'm not speaking theoretically.  I've had to grapple with a
substantial subset of this problem in writing doclifter.  I'm talking
from over a decade of experience here!

The problem is *hard*.  The doclifter design is a baby AI - it
contains about half a dozen stacked and nested expert systems, patched
by a rather hair-raising pile of ad-hoc rules.  In near forty years of
hacking it is the single most complex and algorithmically dense program
I have ever written.  Even so, it is just barely adequate to the job.

Thus, I find it painfully amusing to listen to people making grand plans
like this that they would know are silly if they had read even just
the comments in the doclifter code, let alone the code.

> Even if groff(1) could do as above, and somehow carry over the
> original macro language's "meaning", it'd be only as good as its
> input language.  To wit, Eric proposed extending man(7) with
> semantics to address exactly that.  And that would give us...
> another mdoc(7).

Oh hell no. That would be pointless.

The mdoc(7) design had maximalist goals.  It wanted to be a complete
semantic markup language.  "Keep it simple, stupid" was, shall we say,
not high on the list of priorities.

My goals are much less ambitious.  I want to replace the most possible
low-level troff requests with extension macros that have semantic
weight.  So, for example, the common cliche

.nf
.ft C
random example
.ft
.fi

with 

.EX
random example
.EE

(There are more complicated and interesting possibilities in synopsis
and list markup.)

A main goal is to actually *reduce* the complexity of the man(7) input
language (which now consists not just of the macros but of troff
requests like .nf./.fi) so that non-groff renderers no longer have to
emulate large chunks of groff's typesetting capability, instead being
able to treat macros like EX/EE as rudimentary semantic tags.

I would score the resulting design by the following figure of merit: M / N,
where:

M = number of low-level troff requests that can be disabled in favor of
new extension macros.

N = number of new extension macros required.

So, a very conservative and minimalistic extension set.  It is even
possible that I have already written all the extensions with a net
compexity-reducing payoff - I need to do a frequency analysis of the
manpage corpus to check that.

> So in short, why not throw more weight behind mdoc(7) instead of
> reinventing the wheel?

Because mdoc(7) is an overengineered, overcomplicated mess. Again,
not speaking from theory here but from doclifter experience.

Yes, it gives you something halfway to semantic tags, but the cost is
that you have to hand-rewrite the universe into markup that is much
more complex than man(7) and has really headache-inducing failure
modes.  man(7) may be crude, but parsing and debugging it is far
simpler.

It's just not worth the extra effort to go through mdoc(7) - not when
doclifter plus DocBook stylesheets will over 93% of the time generate
HTML that is just as good (that is, just as semantically informed) as
mandoc can do.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Re: [Groff] Manpages, groff, and the browser.

Reply via email to