Re: [Groff] Applications of \c in man pages in the wild [LONG]

Ingo Schwarze Mon, 01 May 2017 08:47:26 -0700

Hi Branden,

G. Branden Robinson wrote on Sun, Apr 30, 2017 at 07:51:26PM -0400:


> some of these categories are going to be hard to recognize without
> a standalone *roff parser, which I don't think exists.

I'm working on that in mandoc, albeit rather slowly.

Mandoc is a four-phase program.  The order of the phases is always
the same, but command line options may cause individual phases to be
skipped:

  phase 1: file selection (e.g. database or file system search)
           [skipped in "man -l"]
  loop over selected files
    phase 2: parser (for roff, mdoc, man, tbl, eqn)
             [skipped in "man -k" unless "-a" is given]
    phase 3: formatter (ascii, utf8, html, ps, pdf, man, markdown)
             [skipped in "man -Tlint"]
  phase 4: pager (usually more(1) or less(1))
           [skipped in "man -c"]

The output of phase 2 that is passed as input to phase 3 is an
abstract syntax tree, either an mdoc(7) AST or a man(7) AST.

Phase 2 currently consists of three sub-phases:

  phase 2.1: roff(7) prepocessor
             input: roff(7) text file
             output: mdoc(7) or man(7) text file
               no longer containing any low-level roff(7) elements;
               for example, all register, string, and macro
               definitions and interpolations are evaluated
               and expanded, etc.
  phase 2.2: mdoc(7) or man(7) parser
             generating the raw AST
  phase 2.3: mdoc(7) or man(7) normalizing validator
             modifying the AST

During the last about two years, i already unified all the data
structures and node handling utility functions and i'm now near the
point where the roff(7) preprocessor can slowly begin to evolve
into a real pre-parser: that is, where it can begin to add low-level
roff(7) nodes to the AST parse tree in addition to preprocessing
the input.

Even in an intermediate state where only some roff constructs will
be parsed into the AST, that concept may start to become useful for
syntactic and semantic analysis of mixed roff(7)/mdoc(7) and mixed
roff(7)/man(7) sources.

Roff nodes in the AST are not available yet, though.  So far, i
only established the technological foundation to build them on, as
described above.

Yours,
  Ingo

Re: [Groff] Applications of \c in man pages in the wild [LONG]

Reply via email to