groff 1.24.0 status report and questions for next steps

G. Branden Robinson Sat, 22 Mar 2025 18:16:27 -0700

[following up on my email of 9 March, but sending only to groff@]

I have some happy announcements to make and questions to ask of this
list's subscribers.  In my previous status email I enumerated several
problems with getting to a groff 1.24.0 release candidate.


All of them, more or less, are now resolved.

* I noted Savannah #66675 as a trouble spot.  With today's push, it's
  fixed.  The ultimate resolution was simple.  Dave Kemper has been
  extremely helpful in identifying problems and regressions, catching me
  out in misconceptions, and compelling me to make sense after I fail to
  explain things cogently.

* All of the debugging features I mused about, except details of the
  character resolution process, are now implemented.  I'll share their
  man page descriptions and illustrate with example shell sessions.

1.  You can demand information about any ordinary, special, or indexed
    character.

groff(7):
     .pchar c ...
                Report, to the standard error stream, information about
                each ordinary or special character c.  A character
                defined by a request (char, fchar, fschar, or schar),
                reports its contents as a JSON‐encoded string, but the
                output is not otherwise in JSON format.

$ groff
.pchar a
character 'a'
  is not translated
  does not have a macro
  special translation: 0
  hyphenation code: 97
  flags: 0
  ASCII code: 97
  asciify code: 0
  is found
  is transparently translatable
  is not translatable as input
  mode: normal
.pchar \['a]
special character "'a"
  is not translated
  does not have a macro
  special translation: 0
  hyphenation code: 97
  flags: 0
  ASCII code: 0
  asciify code: 225
  is found
  is transparently translatable
  is translatable as input
  mode: normal
.pchar \N'65'
character indexed 65 in current font
  is not translated
  does not have a macro
  special translation: 0
  hyphenation code: 0
  flags: 0
  ASCII code: 0
  asciify code: 0
  is found
  is transparently translatable
  is not translatable as input
  mode: normal
.char \[happy] :-)
.pchar \[happy]
special character "happy"
  is not translated
  has a macro: "contents": ":-)"
  special translation: 0
  hyphenation code: 0
  flags: 0
  ASCII code: 0
  asciify code: 0
  is found
  is transparently translatable
  is not translatable as input
  mode: normal

2.  The new `pline` request is now much, much more powerful.  Because a
    node list is really a tree structure, to accurately report the node
    list corresponding to a pending input line, we needed recursive node
    dumping operations.  Now we have them.

groff(7):
     .pline     Report, in JSON syntax to the standard error stream, the
                list of output nodes corresponding to the pending output
                line.  In JSON, a pair of empty brackets “[ ]”
                represents an empty list.

$ printf 'Check out this \\%%Bu\[~n]uel flick.\n.pline\n' | ./build/test-groff 
-z
[{"type": "line_start_node", "diversion level": 0, "is_special_node": false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "C"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "h"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "e"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "c"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "k"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false, 
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, 
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 
2500 }], "unformat": false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "o"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "u"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "t"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false, 
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, 
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 
2500 }], "unformat": false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "t"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "h"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "i"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "s"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false, 
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, 
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 
2500 }], "unformat": false},
{"type": "hyphen_inhibitor_node", "diversion level": 0, "is_special_node": 
false},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "B"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "u"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, "special 
character": "~n"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "u"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "e"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "l"},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false, 
"hunits": 2500, "undiscardable": false, "is hyphenless breakpoint": false, 
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 
2500 }], "unformat": false},
{"type": "ligature_node", "diversion level": 0, "is_special_node": false, "n1": 
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "f"}, "n2": {"type": "glyph_node", "diversion level": 0, 
"is_special_node": false, "character": "l"}},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "i"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "c"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "k"},
{"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "."},
{"type": "word_space_node", "diversion level": 0, "is_special_node": false, 
"hunits": 5000, "undiscardable": false, "is hyphenless breakpoint": false, 
"terminal_color": "default", "width_list": [{ "width": 2500, "sentence_width": 
2500 }], "unformat": false}]

That's a lot.  Send the standard error stream to jq(1) to make the
tree structure more obvious.

$ printf 'Check out this \\%%Bu\[~n]uel flick.\n.pline\n' \
  | ./build/test-groff -z 2>&1 | jq
[
  {
    "type": "line_start_node",
    "diversion level": 0,
    "is_special_node": false
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "C"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "h"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "e"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "c"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "k"
  },
  {
    "type": "word_space_node",
    "diversion level": 0,
    "is_special_node": false,
    "hunits": 2500,
    "undiscardable": false,
    "is hyphenless breakpoint": false,
    "terminal_color": "default",
    "width_list": [
      {
        "width": 2500,
        "sentence_width": 2500
      }
    ],
    "unformat": false
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "o"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "u"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "t"
  },
  {
    "type": "word_space_node",
    "diversion level": 0,
    "is_special_node": false,
    "hunits": 2500,
    "undiscardable": false,
    "is hyphenless breakpoint": false,
    "terminal_color": "default",
    "width_list": [
      {
        "width": 2500,
        "sentence_width": 2500
      }
    ],
    "unformat": false
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "t"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "h"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "i"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "s"
  },
  {
    "type": "word_space_node",
    "diversion level": 0,
    "is_special_node": false,
    "hunits": 2500,
    "undiscardable": false,
    "is hyphenless breakpoint": false,
    "terminal_color": "default",
    "width_list": [
      {
        "width": 2500,
        "sentence_width": 2500
      }
    ],
    "unformat": false
  },
  {
    "type": "hyphen_inhibitor_node",
    "diversion level": 0,
    "is_special_node": false
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "B"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "u"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "special character": "~n"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "u"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "e"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "l"
  },
  {
    "type": "word_space_node",
    "diversion level": 0,
    "is_special_node": false,
    "hunits": 2500,
    "undiscardable": false,
    "is hyphenless breakpoint": false,
    "terminal_color": "default",
    "width_list": [
      {
        "width": 2500,
        "sentence_width": 2500
      }
    ],
    "unformat": false
  },
  {
    "type": "ligature_node",
    "diversion level": 0,
    "is_special_node": false,
    "n1": {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "f"
    },
    "n2": {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "l"
    }
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "i"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "c"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "k"
  },
  {
    "type": "glyph_node",
    "diversion level": 0,
    "is_special_node": false,
    "character": "."
  },
  {
    "type": "word_space_node",
    "diversion level": 0,
    "is_special_node": false,
    "hunits": 5000,
    "undiscardable": false,
    "is hyphenless breakpoint": false,
    "terminal_color": "default",
    "width_list": [
      {
        "width": 2500,
        "sentence_width": 2500
      }
    ],
    "unformat": false
  }
]

3.  The `pm` request now (optionally) accepts a list of names to dump.
    (Its behavior when given no arguments is unchanged.)

groff(7):
     .pm        Report, to the standard error stream, the names of all
                defined macros, strings, and diversions and their sizes
                in bytes.
     .pm name ...
                Report, to the standard error stream, the name and JSON‐
                encoded contents of each macro, string, or diversion
                name.

$ printf '.ds mystring " hello, \\[dq]world\\[dq]\n.pm mystring\n' \
  | ./build/test-groff -ms
{"name": "mystring", "contents": " hello, \\[dq]world\\[dq]"}

Caution: a single backslash has to be escaped both for printf(1) on the
way in, and for correct JSON representation on the way.  So there's
really only one backslash before each `[dq]` in this example.  With that
in mind, we can see that string definitions are read in copy mode just
as the documentation has always claimed.  Also observe the leading space
in the string contents.

$ echo '.pm LP' | ./build/test-groff -ms
{"name": "LP", "contents": ".if !'\\n[.z]'' 
\u0016\u0011.\tbr\n.di\n.\u0017\n.br\n.cov*ab-init\n.cov*print\n.nop 
\\*[\\$0]\\\n"}

    The disclosure of GNU troff's encoding technique for certain tokens
    is a mixed blessing.  On the one hand, no one can be expected to
    know what these JSON-encoded C0 control characters mean off the top
    of their head, and they'll have to consult "src/rocc/troff/input.h"
    in the groff source tree to decode them.  On the other hand,
    exposure of this information, formerly impossible outside of a GDB
    session, should be a boon to developers and ambitious macro
    programmers.

4.  Did you notice the word "diversions" in the previous item?
    Implementing this feature cleared up some confusion I had about the
    nature of the `macro_header` class inside GNU troff.  In my earlier
    message I wondered why it contained objects of both `char_list` and
    `node_list` types.  Now I know.  These could have been wrapped in a
    C/C++ `union`.  (In Ada, we'd use a "discriminated record".)  Macros
    and strings use only the `char_list`.  Diversions use only the
    `node_list`.  This made implementation of the dumping feature
    straightforward.  It also means that diversion dumping can be even
    more chatty than dumping the pending output line node list.  Here's
    the example I put in the commit message.

$ printf '.di foo\nABC.\n.sp\nDEF\n.br\n.di\n.pm foo\n' \
  | build/test-groff -z 2>&1
{"name": "foo", "contents": [{"type": "line_start_node", "diversion level": 0, 
"is_special_node": false}, {"type": "glyph_node", "diversion level": 0, 
"is_special_node": false, "character": "A"}, {"type": "glyph_node", "diversion 
level": 0, "is_special_node": false, "character": "B"}, {"type": "glyph_node", 
"diversion level": 0, "is_special_node": false, "character": "C"}, {"type": 
"glyph_node", "diversion level": 0, "is_special_node": false, "character": 
"."}, {"type": "vertical_size_node", "diversion level": 0, "is_special_node": 
false, "vunits": -12000}, {"type": "vertical_size_node", "diversion level": 0, 
"is_special_node": false, "vunits": 0}, {"type": "diverted_space_node", 
"diversion level": 0, "is_special_node": false, "vunits": 12000}, {"type": 
"line_start_node", "diversion level": 0, "is_special_node": false}, {"type": 
"glyph_node", "diversion level": 0, "is_special_node": false, "character": 
"D"}, {"type": "glyph_node", "diversion level": 0, "is_special_node": false, 
"character": "E"}, {"type": "glyph_node", "diversion level": 0, 
"is_special_node": false, "character": "F"}, {"type": "vertical_size_node", 
"diversion level": 0, "is_special_node": false, "vunits": -12000}, {"type": 
"vertical_size_node", "diversion level": 0, "is_special_node": false, "vunits": 
0}]}
$ printf '.di foo\nABC.\n.sp\nDEF\n.br\n.di\n.pm foo\n' \
  | build/test-groff -z 2>&1 | jq
{
  "name": "foo",
  "contents": [
    {
      "type": "line_start_node",
      "diversion level": 0,
      "is_special_node": false
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "A"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "B"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "C"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "."
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": -12000
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": 0
    },
    {
      "type": "diverted_space_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": 12000
    },
    {
      "type": "line_start_node",
      "diversion level": 0,
      "is_special_node": false
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "D"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "E"
    },
    {
      "type": "glyph_node",
      "diversion level": 0,
      "is_special_node": false,
      "character": "F"
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": -12000
    },
    {
      "type": "vertical_size_node",
      "diversion level": 0,
      "is_special_node": false,
      "vunits": 0
    }
  ]
}

    In practice, a diversion may contain up to an entire page of
    formatted text, so I expect their dumps to potentially be really
    huge.  But the user can now inspect them in minute detail.

Next steps:

* I need to know from this community what, if anything, should now gate
  RC1.  I don't plan on a code freeze until RC2, but I don't want to
  mess with the formatter anymore, except to possibly do one thing I've
  already worked up and tested.

* Review the Savannah 1.24.0 release goals ticket.

  https://savannah.gnu.org/bugs/?65099

  Deri's patiently been awaiting my feedback on his contribution of PDF
  superpowers to the ms package, which could easily be added to the
  goals.  As illustrated on this list, it seems to work fine with the
  reasonably complex ms.ms document.  Getting first-class PDF support
  into all our full-service macro packages is, I think, a prerequisite
  to making the default output device PDF.  Maybe more than a
  "prerequisite": once we have that support, I'm finding it hard to
  imagine reasons _not_ to change the default output device thus.  I
  don't think we'll get groff_mm(7) or groff_me(7) in time for 1.24.0.
  (Nobody's working on these tasks, and I don't want to wait/gate on
  myself to take care of them.)

* The one respect in which I'm contemplating still changing the
  formatter itself is this:

diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
index 37dd7954c..3fcc6c098 100644
--- a/src/roff/troff/env.cpp
+++ b/src/roff/troff/env.cpp
@@ -2543,6 +2543,8 @@ void environment::do_break(bool want_adjustment)
        break;
       }
     }
+    if (getenv("GROFF_DUMP") != 0 /* nullptr */)
+      curenv->dump_pending_nodes();
     node *tem = line;
     line = 0 /* nullptr */;
     output_line(tem, width_total, was_centered);

That's all.  What does this do?  It tells GNU troff to do the equivalent
of `pline` every time it's about to perform a break.

What does that mean?  You get a complete node graph of your document.
Because like Osiris, a *roff's node-generation procedure dies and is
born again with every new output line,[1] this graph is, more precisely,
a linear forest: a list of trees.  (A hedgerow?  Bustling since 1971?)

This is something mandoc(1) has had for years.  Now we can have it too.

The reason I haven't already committed this is because it requires an
interface decision.  Use an environment variable?  If so, named what?
Use a command-line option?  If so, which letter do we want to
permanently eat for it?

Not many are available:

groff(1):
     groff [-abcCeEgGijklNpRsStUVXzZ] [-d ctext] [-d string=text]
           [-D fallback‐encoding] [-f font‐family] [-F font‐directory]
           [-I inclusion‐directory] [-K input‐encoding] [-L spooler‐
           argument] [-m macro‐package] [-M macro‐directory] [-n page‐
           number] [-o page‐list] [-P postprocessor‐argument]
           [-r cnumeric‐expression] [-r register=numeric‐expression]
           [-T output‐device] [-w warning‐category] [-W warning‐
           category] [file ...]

I want to hear your feedback on all of the questions above.

Regards,
Branden

[1] I'll bet the main reason for this was to reduce the memory footprint
    of the implementation back in core-starved PDP-11 days.

signature.asc
Description: PGP signature

groff 1.24.0 status report and questions for next steps

Reply via email to