[following up on my email of 9 March, but sending only to groff@] I have some happy announcements to make and questions to ask of this list's subscribers. In my previous status email I enumerated several problems with getting to a groff 1.24.0 release candidate.
All of them, more or less, are now resolved. * I noted Savannah #66675 as a trouble spot. With today's push, it's fixed. The ultimate resolution was simple. Dave Kemper has been extremely helpful in identifying problems and regressions, catching me out in misconceptions, and compelling me to make sense after I fail to explain things cogently. * All of the debugging features I mused about, except details of the character resolution process, are now implemented. I'll share their man page descriptions and illustrate with example shell sessions. 1. You can demand information about any ordinary, special, or indexed character. groff(7): .pchar c ... Report, to the standard error stream, information about each ordinary or special character c. A character defined by a request (char, fchar, fschar, or schar), reports its contents as a JSON‐encoded string, but the output is not otherwise in JSON format. $ groff .pchar a character 'a' is not translated does not have a macro special translation: 0 hyphenation code: 97 flags: 0 ASCII code: 97 asciify code: 0 is found is transparently translatable is not translatable as input mode: normal .pchar \['a] special character "'a" is not translated does not have a macro special translation: 0 hyphenation code: 97 flags: 0 ASCII code: 0 asciify code: 225 is found is transparently translatable is translatable as input mode: normal .pchar \N'65' character indexed 65 in current font is not translated does not have a macro special translation: 0 hyphenation code: 0 flags: 0 ASCII code: 0 asciify code: 0 is found is transparently translatable is not translatable as input mode: normal .char \[happy] :-) .pchar \[happy] special character "happy" is not translated has a macro: "contents": ":-)" special translation: 0 hyphenation code: 0 flags: 0 ASCII code: 0 asciify code: 0 is found is transparently translatable is not translatable as input mode: normal 2. The new `pline` request is now much, much more powerful. Because a node list is really a tree structure, to accurately report the node list corresponding to a pending input line, we needed recursive node dumping operations. Now we have them. groff(7): .pline Report, in JSON syntax to the standard error stream, the list of output nodes corresponding to the pending output line. In JSON, a pair of empty brackets “[ ]” represents an empty list. $ printf 'Check out this \\%%Bu\[~n]uel flick.\n.pline\n' | ./build/test-groff -z [{"type": "line_start_node", "diversion level": 0, "is_special_node": false}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "C"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "h"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "e"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "c"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "k"}, {"type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 2500 }], "unformat": false}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "o"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "u"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "t"}, {"type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 2500 }], "unformat": false}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "t"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "h"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "i"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "s"}, {"type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 2500 }], "unformat": false}, {"type": "hyphen_inhibitor_node", "diversion level": 0, "is_special_node": false}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "B"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "u"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "special character": "~n"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "u"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "e"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "l"}, {"type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 2500 }], "unformat": false}, {"type": "ligature_node", "diversion level": 0, "is_special_node": false, "n1": {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "f"}, "n2": {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "l"}}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "i"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "c"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "k"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "."}, {"type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 5000, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 2500 }], "unformat": false}] That's a lot. Send the standard error stream to jq(1) to make the tree structure more obvious. $ printf 'Check out this \\%%Bu\[~n]uel flick.\n.pline\n' \ | ./build/test-groff -z 2>&1 | jq [ { "type": "line_start_node", "diversion level": 0, "is_special_node": false }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "C" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "h" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "e" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "c" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "k" }, { "type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [ { "width": 2500, "sentence_width": 2500 } ], "unformat": false }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "o" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "u" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "t" }, { "type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [ { "width": 2500, "sentence_width": 2500 } ], "unformat": false }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "t" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "h" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "i" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "s" }, { "type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [ { "width": 2500, "sentence_width": 2500 } ], "unformat": false }, { "type": "hyphen_inhibitor_node", "diversion level": 0, "is_special_node": false }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "B" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "u" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "special character": "~n" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "u" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "e" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "l" }, { "type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [ { "width": 2500, "sentence_width": 2500 } ], "unformat": false }, { "type": "ligature_node", "diversion level": 0, "is_special_node": false, "n1": { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "f" }, "n2": { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "l" } }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "i" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "c" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "k" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "." }, { "type": "word_space_node", "diversion level": 0, "is_special_node": false, "hunits": 5000, "undiscardable": false, "is hyphenless breakpoint": false, "terminal_color": "default", "width_list": [ { "width": 2500, "sentence_width": 2500 } ], "unformat": false } ] 3. The `pm` request now (optionally) accepts a list of names to dump. (Its behavior when given no arguments is unchanged.) groff(7): .pm Report, to the standard error stream, the names of all defined macros, strings, and diversions and their sizes in bytes. .pm name ... Report, to the standard error stream, the name and JSON‐ encoded contents of each macro, string, or diversion name. $ printf '.ds mystring " hello, \\[dq]world\\[dq]\n.pm mystring\n' \ | ./build/test-groff -ms {"name": "mystring", "contents": " hello, \\[dq]world\\[dq]"} Caution: a single backslash has to be escaped both for printf(1) on the way in, and for correct JSON representation on the way. So there's really only one backslash before each `[dq]` in this example. With that in mind, we can see that string definitions are read in copy mode just as the documentation has always claimed. Also observe the leading space in the string contents. $ echo '.pm LP' | ./build/test-groff -ms {"name": "LP", "contents": ".if !'\\n[.z]'' \u0016\u0011.\tbr\n.di\n.\u0017\n.br\n.cov*ab-init\n.cov*print\n.nop \\*[\\$0]\\\n"} The disclosure of GNU troff's encoding technique for certain tokens is a mixed blessing. On the one hand, no one can be expected to know what these JSON-encoded C0 control characters mean off the top of their head, and they'll have to consult "src/rocc/troff/input.h" in the groff source tree to decode them. On the other hand, exposure of this information, formerly impossible outside of a GDB session, should be a boon to developers and ambitious macro programmers. 4. Did you notice the word "diversions" in the previous item? Implementing this feature cleared up some confusion I had about the nature of the `macro_header` class inside GNU troff. In my earlier message I wondered why it contained objects of both `char_list` and `node_list` types. Now I know. These could have been wrapped in a C/C++ `union`. (In Ada, we'd use a "discriminated record".) Macros and strings use only the `char_list`. Diversions use only the `node_list`. This made implementation of the dumping feature straightforward. It also means that diversion dumping can be even more chatty than dumping the pending output line node list. Here's the example I put in the commit message. $ printf '.di foo\nABC.\n.sp\nDEF\n.br\n.di\n.pm foo\n' \ | build/test-groff -z 2>&1 {"name": "foo", "contents": [{"type": "line_start_node", "diversion level": 0, "is_special_node": false}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "A"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "B"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "C"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "."}, {"type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": -12000}, {"type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": 0}, {"type": "diverted_space_node", "diversion level": 0, "is_special_node": false, "vunits": 12000}, {"type": "line_start_node", "diversion level": 0, "is_special_node": false}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "D"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "E"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "F"}, {"type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": -12000}, {"type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": 0}]} $ printf '.di foo\nABC.\n.sp\nDEF\n.br\n.di\n.pm foo\n' \ | build/test-groff -z 2>&1 | jq { "name": "foo", "contents": [ { "type": "line_start_node", "diversion level": 0, "is_special_node": false }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "A" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "B" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "C" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "." }, { "type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": -12000 }, { "type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": 0 }, { "type": "diverted_space_node", "diversion level": 0, "is_special_node": false, "vunits": 12000 }, { "type": "line_start_node", "diversion level": 0, "is_special_node": false }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "D" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "E" }, { "type": "glyph_node", "diversion level": 0, "is_special_node": false, "character": "F" }, { "type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": -12000 }, { "type": "vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": 0 } ] } In practice, a diversion may contain up to an entire page of formatted text, so I expect their dumps to potentially be really huge. But the user can now inspect them in minute detail. Next steps: * I need to know from this community what, if anything, should now gate RC1. I don't plan on a code freeze until RC2, but I don't want to mess with the formatter anymore, except to possibly do one thing I've already worked up and tested. * Review the Savannah 1.24.0 release goals ticket. https://savannah.gnu.org/bugs/?65099 Deri's patiently been awaiting my feedback on his contribution of PDF superpowers to the ms package, which could easily be added to the goals. As illustrated on this list, it seems to work fine with the reasonably complex ms.ms document. Getting first-class PDF support into all our full-service macro packages is, I think, a prerequisite to making the default output device PDF. Maybe more than a "prerequisite": once we have that support, I'm finding it hard to imagine reasons _not_ to change the default output device thus. I don't think we'll get groff_mm(7) or groff_me(7) in time for 1.24.0. (Nobody's working on these tasks, and I don't want to wait/gate on myself to take care of them.) * The one respect in which I'm contemplating still changing the formatter itself is this: diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp index 37dd7954c..3fcc6c098 100644 --- a/src/roff/troff/env.cpp +++ b/src/roff/troff/env.cpp @@ -2543,6 +2543,8 @@ void environment::do_break(bool want_adjustment) break; } } + if (getenv("GROFF_DUMP") != 0 /* nullptr */) + curenv->dump_pending_nodes(); node *tem = line; line = 0 /* nullptr */; output_line(tem, width_total, was_centered); That's all. What does this do? It tells GNU troff to do the equivalent of `pline` every time it's about to perform a break. What does that mean? You get a complete node graph of your document. Because like Osiris, a *roff's node-generation procedure dies and is born again with every new output line,[1] this graph is, more precisely, a linear forest: a list of trees. (A hedgerow? Bustling since 1971?) This is something mandoc(1) has had for years. Now we can have it too. The reason I haven't already committed this is because it requires an interface decision. Use an environment variable? If so, named what? Use a command-line option? If so, which letter do we want to permanently eat for it? Not many are available: groff(1): groff [-abcCeEgGijklNpRsStUVXzZ] [-d ctext] [-d string=text] [-D fallback‐encoding] [-f font‐family] [-F font‐directory] [-I inclusion‐directory] [-K input‐encoding] [-L spooler‐ argument] [-m macro‐package] [-M macro‐directory] [-n page‐ number] [-o page‐list] [-P postprocessor‐argument] [-r cnumeric‐expression] [-r register=numeric‐expression] [-T output‐device] [-w warning‐category] [-W warning‐ category] [file ...] I want to hear your feedback on all of the questions above. Regards, Branden [1] I'll bet the main reason for this was to reduce the memory footprint of the implementation back in core-starved PDP-11 days.
signature.asc
Description: PGP signature