Kristaps Dzonsons <krist...@bsd.lv>: > Browsers are confusing because HTML doesn't play with > character-driven media. And roff(7), into which groff(1) translates > man(7) and mdoc(7), is (significantly?) character-driven. We hack > around this by converting -Tascii output into <pre>-wrapped > documents. But that's not really HTML and makes browsers cry. > > One solution is to disregard roff(7) and regard only man(7) and > mdoc(7). mandoc(1) does this. It gets away with it because it's > built specifically (and in a way, dumbly) just for man(7) and > mdoc(7) and just enough roff(7), tbl(7), etc.
This is the same approach that doclifter takes. I think it's the only practical one. Ignore for the moment the fact that it goes through DocBook on the way to HTML - DocBook isn't actually the point, it's a way to enforce separation of concerns and delegate actual HTML or PostScript gneration to specially tuned back ends. > It was suggested that groff(1) be taught a subset of roff(7) that > can map into a tree structure, then compile that further into HTML. > If this is possible (it sounds hard and/or awesome), and if somebody > pulls it off and modifies the existing macros to use the "clean" > roff(7), then groff(1) would map beautifully into HTML and not care > whether its input is mom(7), mdoc(7), or man(7) so long as the > underlying tmac file has been properly treated. That's a lot of > work: identifying the relevant roff(7) macros, then teaching > groff(1) to extract a syntax tree from those macros, then doing > something with that syntax tree, then modifying the macro packages. > But it sounds, to my uninformed ear, possible. It sounds, to my informed ear, effectively impossible. I'm not speaking theoretically. I've had to grapple with a substantial subset of this problem in writing doclifter. I'm talking from over a decade of experience here! The problem is *hard*. The doclifter design is a baby AI - it contains about half a dozen stacked and nested expert systems, patched by a rather hair-raising pile of ad-hoc rules. In near forty years of hacking it is the single most complex and algorithmically dense program I have ever written. Even so, it is just barely adequate to the job. Thus, I find it painfully amusing to listen to people making grand plans like this that they would know are silly if they had read even just the comments in the doclifter code, let alone the code. > Even if groff(1) could do as above, and somehow carry over the > original macro language's "meaning", it'd be only as good as its > input language. To wit, Eric proposed extending man(7) with > semantics to address exactly that. And that would give us... > another mdoc(7). Oh hell no. That would be pointless. The mdoc(7) design had maximalist goals. It wanted to be a complete semantic markup language. "Keep it simple, stupid" was, shall we say, not high on the list of priorities. My goals are much less ambitious. I want to replace the most possible low-level troff requests with extension macros that have semantic weight. So, for example, the common cliche .nf .ft C random example .ft .fi with .EX random example .EE (There are more complicated and interesting possibilities in synopsis and list markup.) A main goal is to actually *reduce* the complexity of the man(7) input language (which now consists not just of the macros but of troff requests like .nf./.fi) so that non-groff renderers no longer have to emulate large chunks of groff's typesetting capability, instead being able to treat macros like EX/EE as rudimentary semantic tags. I would score the resulting design by the following figure of merit: M / N, where: M = number of low-level troff requests that can be disabled in favor of new extension macros. N = number of new extension macros required. So, a very conservative and minimalistic extension set. It is even possible that I have already written all the extensions with a net compexity-reducing payoff - I need to do a frequency analysis of the manpage corpus to check that. > So in short, why not throw more weight behind mdoc(7) instead of > reinventing the wheel? Because mdoc(7) is an overengineered, overcomplicated mess. Again, not speaking from theory here but from doclifter experience. Yes, it gives you something halfway to semantic tags, but the cost is that you have to hand-rewrite the universe into markup that is much more complex than man(7) and has really headache-inducing failure modes. man(7) may be crude, but parsing and debugging it is far simpler. It's just not worth the extra effort to go through mdoc(7) - not when doclifter plus DocBook stylesheets will over 93% of the time generate HTML that is just as good (that is, just as semantically informed) as mandoc can do. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>