Hi Branden, G. Branden Robinson wrote on Sun, Apr 30, 2017 at 07:51:26PM -0400:
> some of these categories are going to be hard to recognize without > a standalone *roff parser, which I don't think exists. I'm working on that in mandoc, albeit rather slowly. Mandoc is a four-phase program. The order of the phases is always the same, but command line options may cause individual phases to be skipped: phase 1: file selection (e.g. database or file system search) [skipped in "man -l"] loop over selected files phase 2: parser (for roff, mdoc, man, tbl, eqn) [skipped in "man -k" unless "-a" is given] phase 3: formatter (ascii, utf8, html, ps, pdf, man, markdown) [skipped in "man -Tlint"] phase 4: pager (usually more(1) or less(1)) [skipped in "man -c"] The output of phase 2 that is passed as input to phase 3 is an abstract syntax tree, either an mdoc(7) AST or a man(7) AST. Phase 2 currently consists of three sub-phases: phase 2.1: roff(7) prepocessor input: roff(7) text file output: mdoc(7) or man(7) text file no longer containing any low-level roff(7) elements; for example, all register, string, and macro definitions and interpolations are evaluated and expanded, etc. phase 2.2: mdoc(7) or man(7) parser generating the raw AST phase 2.3: mdoc(7) or man(7) normalizing validator modifying the AST During the last about two years, i already unified all the data structures and node handling utility functions and i'm now near the point where the roff(7) preprocessor can slowly begin to evolve into a real pre-parser: that is, where it can begin to add low-level roff(7) nodes to the AST parse tree in addition to preprocessing the input. Even in an intermediate state where only some roff constructs will be parsed into the AST, that concept may start to become useful for syntactic and semantic analysis of mixed roff(7)/mdoc(7) and mixed roff(7)/man(7) sources. Roff nodes in the AST are not available yet, though. So far, i only established the technological foundation to build them on, as described above. Yours, Ingo