On Wed, Aug 16, 2017 at 08:28:53PM +0200, Ingo Schwarze wrote: > Subject: Re: [Groff] Regarding HTML rendering > ... > The roff language is a poor fit for what HTML excels in, > namely, hierarchical representation of information and semantic > markup. The HTML language is a poor fit for what groff excels > in, namely, exact positioning of glyphs and lines on paper. So > the programmer is likely to spend lots of time trying to write > heuristic code to somehow transform the linear flow of pure > formatting instruction roff provides into something structured > and semantically enriched. Yet the user will likely be > disappointed because they won't find the precision and elegance > they are used to from groff PostScript and PDF output in the > HTML result.
While I agree with the above analysis completely, I also suggest dealing with the problem by abstracting one level up. The roff language is for typesetting on a cut sheet of paper, whereas HTML is for presentation on a scrolling screen. The only thing they have in common is the content, and attempts to display the content identically on such different media does not result in a pleasant experience for the reader. The three addresses across the top of the letter Mikkel produced are a case in point: for paper output, you know the dimensions within which the three headings can reasonably fit; for output meant for reading in a browser, you have no idea what the dimensions of the screen will be and the right-hand address might disappear completely if set flush right. Anyone who intends to output content to such different media should use some sort of a higher-level markup whose purpose is to delineate the structure of the document, then filter that through the typographic system appropriate for the output medium. This would mean that for paper output the filtered output would produce a MOM source file and the for Web-based output a (probably fairly simple) HTML file. All formatting for the former would be determined by the MOM macro package and for the latter by CSS style sheets, which one would probably need to write. Peter has pointed out that MOM can be used to delineate structure, at least in documents where only high-level MOM macros are used (and not low-level groff requests) with a discipline aiming at strict semantics. And from that, as he also points out, it's not hard to use a scripting language to produce a valid HTML document. A relatively simple notation like Markdown would also work, and XML tagging is often used as well. But in both cases you'll need a fairly sophisticated script for filtering, especially for the groff output, since I don't know of any good generally available tools for this. Such an effort makes the most sense if one adopts this kind of a document-making procedure for regular use. When I use XML for this purpose, I try to keep the tag set relatively small, but broad enough to cover the major semantic elements of the document, e.g., for a book, including tags whose names clearly identify footnotes or endnotes, epigraphs, copyright information, bibilographic and index entries, and more specific author details. A sufficient tag set could be xhtml plus the above. I can keep my documents as valid XML but still insert special typographical instructions either as processing instructions (a last resort) or as attributes to a tag. For example to track kern a paragraph I'll use something like <p kern="-.15p"> and have my XML parser pass the attribute name and value to the paragraph macro. -- Steve -- Steve Izma si...@golden.net Home: 35 Locust St., Kitchener ON 519-745-1313 Canada, N2H 1W6 cell: 519-998-2684