Re: [Groff] Regarding HTML rendering

Steve Izma Thu, 17 Aug 2017 13:54:13 -0700

On Wed, Aug 16, 2017 at 08:28:53PM +0200, Ingo Schwarze wrote:
> Subject: Re: [Groff] Regarding HTML rendering
> ...
> The roff language is a poor fit for what HTML excels in,
> namely, hierarchical representation of information and semantic
> markup. The HTML language is a poor fit for what groff excels
> in, namely, exact positioning of glyphs and lines on paper.  So
> the programmer is likely to spend lots of time trying to write
> heuristic code to somehow transform the linear flow of pure
> formatting instruction roff provides into something structured
> and semantically enriched. Yet the user will likely be
> disappointed because they won't find the precision and elegance
> they are used to from groff PostScript and PDF output in the
> HTML result.


While I agree with the above analysis completely, I also suggest
dealing with the problem by abstracting one level up. The roff
language is for typesetting on a cut sheet of paper, whereas HTML
is for presentation on a scrolling screen. The only thing they
have in common is the content, and attempts to display the
content identically on such different media does not result in a
pleasant experience for the reader. The three addresses across
the top of the letter Mikkel produced are a case in point: for
paper output, you know the dimensions within which the three
headings can reasonably fit; for output meant for reading in a
browser, you have no idea what the dimensions of the screen will
be and the right-hand address might disappear completely if set
flush right.

Anyone who intends to output content to such different media
should use some sort of a higher-level markup whose purpose is to
delineate the structure of the document, then filter that through
the typographic system appropriate for the output medium. This
would mean that for paper output the filtered output would
produce a MOM source file and the for Web-based output a
(probably fairly simple) HTML file. All formatting for the former
would be determined by the MOM macro package and for the latter
by CSS style sheets, which one would probably need to write.

Peter has pointed out that MOM can be used to delineate
structure, at least in documents where only high-level MOM macros
are used (and not low-level groff requests) with a discipline
aiming at strict semantics. And from that, as he also points out,
it's not hard to use a scripting language to produce a valid HTML
document.

A relatively simple notation like Markdown would also work, and
XML tagging is often used as well. But in both cases you'll need
a fairly sophisticated script for filtering, especially for the
groff output, since I don't know of any good generally available
tools for this. Such an effort makes the most sense if one adopts
this kind of a document-making procedure for regular use.

When I use XML for this purpose, I try to keep the tag set
relatively small, but broad enough to cover the major semantic
elements of the document, e.g., for a book, including tags whose
names clearly identify footnotes or endnotes, epigraphs,
copyright information, bibilographic and index entries, and more
specific author details. A sufficient tag set could be xhtml plus
the above.

I can keep my documents as valid XML but still insert special
typographical instructions either as processing instructions (a
last resort) or as attributes to a tag. For example to track kern
a paragraph I'll use something like <p kern="-.15p"> and have my
XML parser pass the attribute name and value to the paragraph
macro.

        -- Steve

-- 
Steve Izma                                si...@golden.net
    Home:  35 Locust St., Kitchener ON        519-745-1313 
           Canada, N2H 1W6              cell: 519-998-2684

Re: [Groff] Regarding HTML rendering

Reply via email to