Hi folks, Those of us who worked with groff 1.22.4 may remember a couple of diagnostic messages that gobsmacked one with their incomprehensibility.
Here's the source code that produced them. error("can't transparently output node at top level"); error("can't translate %1 to special character '%2'" " in transparent throughput", input_char_description(cc), ci->nm.contents()); For groff 1.23.0, I silenced them with a blunt instrument, since I concluded that they weren't entirely spurious, but nobody seemed to have any idea what, exactly, anyone was supposed to do about them. In general usage scenarios, a diagnostic that the user cannot address is a diagnostic that should not be issued. In a recent thread, Peter noted the resurrection of these messages.[1] At that time, I promised an explanation of these long-vexing problems. In the course of documenting my fix for Savannah #63074[2]--a process that is fairly involved and not yet complete--I ended up writing a change to the groff_diff(7) man page that seems to cover the bases. Here's some language that I have queued up for my next push. First, a comment from the man page source that summarizes where we (I) still have work to do. .\" TODO: When we get this giant headache generalized and adapted to the .\" `\!` escape sequence and `device`, `output`, `cf`, and `trf` .\" requests, move this discussion into a dedicated subsection above. And now, the explanation. groff_diff(7): \X'contents' GNU troff transforms the argument to the device control escape sequence to avoid leaking to device‐ independent output data that are unrepresentable in that format, and to address the problem of expressing character code points outside of the Unicode basic Latin range in an output file format that restricts itself to that range. (See subsection “Basic Latin” of groff_char(7).) The typesetting of such characters is a problem long‐solved in device‐ independent troff by the “C” command; see groff_out(5). The expression of such characters in other contexts, such as device extension commands, was not addressed by the same design. Where possible, GNU troff represents such characters in device‐independent but non‐typesetting contexts using its notation for Unicode special character escape sequences; see subsection “Special character escape forms” of groff_char(7). GNU troff converts several ordinary characters that typeset as non‐basic Latin code points to code points outside that range to avoid confusion when these characters are used in ways that are ultimately visible, as in tag names for PDF bookmarks, which can appear in a viewer’s navigation pane. These ordinary characters are “'”, “-”, “^”, “`”, and “~”; others are written as‐is. Special characters that typeset as Unicode basic Latin characters are translated to basic Latin characters accordingly. For this transformation, character translations and definitions are ignored. So that any Unicode code point can be represented in device extension commands, for example in an author’s name in document metadata or as a usefully named bookmark or hyperlink anchor, GNU troff translates other special characters into their Unicode special character notation. Special characters without a Unicode representation, and escape sequences that do not interpolate a sequence of ordinary and/or special characters, produce warnings in category “char”. I hope to boil some fat off of that. I want to emphasize that error and warning diagnostics will remain possible. But they should not occur when a document or macro package attempts to do things that are "sane", like storing accented letters appearing in an author's name or a section heading into a string and that interpolating that string in a device extension/control escape sequence. Peter's mom(7) package is more muscular even than that; I've noticed that in his example documents he is not shy even of including vertical motions in author names. contrib/mom/examples/mom-pdf.mom:.AUTHOR "Deri James" "\*[UP .5p]and" "Peter Schaffter" Inside the formatter, a vertical motion becomes a "node" and has no possible representation in a device extension/control escape sequence. In the future, I want the formatter to complain about such impossibilities, but not yet--it's not fair to document and macro package authors to be so prescriptive without providing a handy mechanism for cleaning such things out of strings that are destined for device extension/control commands. To date, solutions have included creating a diversion, interpolating the string inside it, then using the `asciify` or `unformat` format requests on it to strip things like vertical motions (and `chop` of course to rip out the undesired newline at the end of the diversion). This is painful because formatting things into a diversion as a rule _creates_ more nodes than it eliminates. Hence the unformatting. It would be cleaner and simpler to provide a mechanism for processing a string directly, discarding escape sequences (like vertical motions or break points [with or without hyphenation). This point is even more emphatic because of the heavy representation of special characters in known use cases. That, to "sanitize" (or "pdfclean") such strings by round-tripping them through a process that converts a sequence of easily handled bytes like "\ [ 'a ]" or "\ [ u 0 4 1 1 ]" into a special character node and then back again seems wasteful and fragile to me. But, to get things where I'd like to see them, we an in-language string iterator for the groff language. And because strings, macros, and diversions can be punned with each other, in practice that means we will need an iterator than can handle any of these. That, in turn, means that we will also require a new conditional expression operator to test whether an element of a string is a "node". I haven't been able to get all of that together in the year since we released groff 1.23. My understanding of the formatter still has significant lacunae. Ah well. Maybe for 1.25. So, in the meantime, my plan is to silently discard things from device extension/control commands that an output device would not able to do anything useful with. Thanks for your patience with this explan-a-thon.[3] Regards, Branden [1] https://lists.gnu.org/archive/html/groff/2024-08/msg00045.html [2] https://savannah.gnu.org/bugs/?63074 [3] In my opinion, the words "special" and "transparent" are the two most relentlessly and unhelpfully overused terms in troff literature. They each mean several different things, and make me a cross and grumpy (semi-official) groff maintainer. Expect documentary reforms addressing these ambiguities.
signature.asc
Description: PGP signature