> > I dimly perceive a lot of good infrastructure there that could be put to > some excellent use. We just need a contributor to pick up the mission.
This has been on my to-do list for Roff.js <https://github.com/Alhadis/Roff.js> for quite some time (ETA hopefully this *decade*). Once/if I finish a POC, I'd be happy to reimplement it in Perl and submit it to Groff. I've a solid idea in my head of how infer(1) would work, I just haven't gotten off my arse to implement it yet... :-) On Sun, 27 Sep 2020 at 16:49, G. Branden Robinson < g.branden.robin...@gmail.com> wrote: > At 2020-09-18T01:18:06+1000, John Gardner wrote: > > To preserve metadata, or identify regions of semantic or structural > > interest, write a preprocessor to delineate unprocessed roff(7) syntax > > with device control functions: > > > > .TH \X'meta: begin title'TITLE\X'end title' > > Yes. > > > Which comes out looking like this in troff's intermediate output: > > > > x X meta: begin title > > t TITLE > > x X meta: end title > > > > Which postprocessors can use if they have some reason to care about > > semantic data. > > Yes! > > > Even if you only care about extracting abstract info instead of > > rendering a document, there's no reason a postprocessor actually has > > to be a typesetter: > > > > $ infer | troff | post-infer --extract-outline --xml ./outline.xml > > | grotty | less > > > > Of course, this would require infer to have prior knowledge of > > specific macro packages, but I fail to see that being an issue. > > Moreover, infer can also identify preprocessor markup, such as tables, > > pictures, equations, and any other shite that's impossible to > > recognise in preprocessor output. > > > > This is similar in spirit to what Werner Lemberg started with > > devtag.tmac, which grohtml(1) already uses to identify numbered > > headings and section titles, Personally, there's a lot more we could > > be doing with that same technique. > > Yes! Yes! Yes! > > I've poked my snout a little bit into grohtml recently because I had to > test my changes to the handling of the man(7) registers (C, D, P, X) > that aren't honored when the output device is -Thtml. I dimly perceive > a lot of good infrastructure there that could be put to some excellent > use. We just need a contributor to pick up the mission. > > Regards, > Branden >