Update of bug #64484 (group groff): Summary: [troff] \X escape sequence should read its argument in copy mode => [troff] \X escape sequence should read its argument in (something like) copy mode
_______________________________________________________ Follow-up Comment #15: Hi Deri, [comment #14 comment #14:] > I am a bit concerned about this. pdf.tmac contains various .device commands, which, if replaced by \X stop it working properly. An understandable concern; one of the reasons this feature is taking a while to land is that I am trying to figure out what the true operational semantics of these direct-grout-generating requests and escape sequences are. Theoretically, there are four different ways to inject stuff into device-independent output. (And that's leaving aside the `cf` and `trf` requests, so there are six. At least.) \X'this is a device control command' .br .device this is a device control command .br \!x X this is a device control command .br .output x X this is a device control command Are all of these equivalent? Not quite--there are subtleties involving line breaks, and [https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.cpp?h=1.23.0#n880 even deeper ones involving state transitions of drawing parameters]. In my opinion, a device control command per se shouldn't imply a change of any drawing parameters. If there is a need to record the fact that a device control command "dirtied" the drawing position, font selection, color configuration, and so on, there should be some separate mechanism of telling the formatter that. This is causing me frustration right now with "sboxes.tmac"; maybe the `fl` request should be given some kind of state-dirtying semantics. (At present, and historically, it doesn't do that.) I won't push until I have it sorted out; I'm trying to avoid asking you to change anything in any macro package. > I have shown previously that gropdf is quite happy to receive groff nodes as 7 bit ascii i.e. the character "â" can be sent to gropdf as \[u00E2] (preconv) or \[^a] (groff special), both 7 bit clean. However, \X blocks both these uses with an error, which is a little confusing to a user because it refers to "special character '\^a'" even if \[u00E2] appears in the input to groff. I agree that that's confusing. You might be pleased to know of a change I have pending. diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp index 89f4518c1..d08fe5e4c 100644 --- a/src/roff/troff/input.cpp +++ b/src/roff/troff/input.cpp @@ -5829,6 +5829,7 @@ static node *do_device_control() // \X return new special_node(mac); } +# if 0 static void device_request() { if (!has_arg(true /* peek; we want to read in copy mode */)) { @@ -5849,15 +5850,49 @@ static void device_request() } if (curdiv == topdiv && topdiv->before_first_page) topdiv->begin_page(); - // Null characters can correspond to node types like vmotion_node that - // are unrepresentable in a device control command, and got scrubbed - // by `asciify`. - for (; c != '\0' && c != '\n' && c != EOF; + for (; c != '\n' && c != EOF; c = get_copy(0 /* nullptr */)) - mac.append(c); + encode_character_for_device_output(&mac, c); curenv->add_node(new special_node(mac)); tok.next(); } +#endif + +static void device_request() +{ + if (!has_arg()) { + warning(WARN_MISSING, "device control request expects arguments"); + skip_line(); + return; + } + macro mac; + while (tok.is_space() || tok.is_tab()) + tok.next(); + if ('"' == tok.ch()) + tok.next(); + for (;;) { + unsigned char c; + if (tok.is_newline() || tok.is_eof()) + break; + if (tok.is_space()) + c = ' '; + else if (tok.is_tab()) + c = '\t'; + else if (tok.is_leader()) + c = '\001'; + else if (tok.is_backspace()) + c = '\b'; + else + c = tok.ch(); + //assert(c != 0); // XXX: a node? + encode_character_for_device_output(&mac, c); + tok.next(); + } + if (curdiv == topdiv && topdiv->before_first_page) + topdiv->begin_page(); + curenv->add_node(new special_node(mac)); + skip_line(); +} static void device_macro_request() { I've enhanced a [https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/tests/device-control-special-character-handling.sh regression test I added in January] that attempts to ensure that processing of various special character escape sequences comes through in device-independent output unmolested. Here's the test input: input='. .nf \X#bogus1: esc \%to-do\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a]# .device bogus1: req \%to-do\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a] .ec @ @X#bogus2: esc @%to-do@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a]## .device bogus2: req @%to-do@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a] .' ...and the results.[1] $ (cd build && ../src/roff/groff/tests/device-control-special-character-handling.sh) x X bogus1: esc to-do\[u1F00] -'"`^\~ x X bogus1: req @%to-do\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a] x X bogus2: esc to-do\[u1F00] -'"`^\~ x X bogus2: req @%to-do@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a] troff:<standard input>:2: error: special character 'u1F63C' cannot be used within a device control escape sequence troff:<standard input>:2: error: special character '`a' cannot be used within a device control escape sequence troff:<standard input>:5: error: special character 'u1F63C' cannot be used within a device control escape sequence troff:<standard input>:5: error: special character '`a' cannot be used within a device control escape sequence checking X escape sequence, default escape character ...FAILED checking X escape sequence, alternate escape character ...FAILED checking for errors on unsupported special character escapes That doesn't look good, but when I add some code... diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp index 229e7956e..041a455e7 100644 --- a/src/roff/troff/input.cpp +++ b/src/roff/troff/input.cpp @@ -5832,6 +5832,7 @@ static node *do_device_control() // \X return new special_node(mac); } +# if 0 static void device_request() { if (!has_arg(true /* peek; we want to read in copy mode */)) { @@ -5852,15 +5853,49 @@ static void device_request() } if (curdiv == topdiv && topdiv->before_first_page) topdiv->begin_page(); - // Null characters can correspond to node types like vmotion_node that - // are unrepresentable in a device control command, and got scrubbed - // by `asciify`. - for (; c != '\0' && c != '\n' && c != EOF; + for (; c != '\n' && c != EOF; c = get_copy(0 /* nullptr */)) - mac.append(c); + encode_character_for_device_output(&mac, c); curenv->add_node(new special_node(mac)); tok.next(); } +#endif + +static void device_request() +{ + if (!has_arg()) { + warning(WARN_MISSING, "device control request expects arguments"); + skip_line(); + return; + } + macro mac; + while (tok.is_space() || tok.is_tab()) + tok.next(); + if ('"' == tok.ch()) + tok.next(); + for (;;) { + unsigned char c; + if (tok.is_newline() || tok.is_eof()) + break; + if (tok.is_space()) + c = ' '; + else if (tok.is_tab()) + c = '\t'; + else if (tok.is_leader()) + c = '\001'; + else if (tok.is_backspace()) + c = '\b'; + else + c = tok.ch(); + //assert(c != 0); // XXX: a node? + encode_character_for_device_output(&mac, c); + tok.next(); + } + if (curdiv == topdiv && topdiv->before_first_page) + topdiv->begin_page(); + curenv->add_node(new special_node(mac)); + skip_line(); +} static void device_macro_request() { ...and with which the `BOXSTART` macro is unhappy (the page background turns completely black), I get the following. $ (cd build && ../src/roff/groff/tests/device-control-special-character-handling.sh) x X bogus1: esc to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0] x X bogus1: req to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0] x X bogus2: esc to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0] x X bogus2: req to-do\[u1F63C]\[u1F00] -'"`^\~\[u00E0] checking X escape sequence, default escape character checking X escape sequence, alternate escape character checking for errors on unsupported special character escapes That's just miles better. > If I understand correctly, your plan is linked to the work you have been doing with filenames (bug #65108, comment 3), which outlines your parsing rules. The restriction in your rule 5d would prevent "strings" of characters in other languages to be used. Many file systems allow utf-8 in filenames:- > -rw-r--r-- 1 derij derij 0 Aug 26 21:40 αβγ.greek > Which would fail 5d, and would prevent pdf bookmarks in any language except basic latin or latin-1 supplement, if these rules are extended to .device. > > Have I understood this correctly? I'm not sure. I don't think so. Running my working copy with above BOXSTART-breaking patch applied, I can do this. $ cat ATTIC/for-deri.man .TH for\-deri 1 2024-08-26 "a demo for Deri" .SH Name for\-deri \- a sample command .SH Description What program requires documentation? .SH αβγ.greek That was some Greek, and it will end up in a device control command when we format this man page for PDF. $ ./build/test-groff -K utf8 -man -T pdf -Z ATTIC/for-deri.man | grep '^x X' x X ps:exec [/Dest /for\-deri(1) /View [/FitH -26000 u] /DEST pdfmark x X ps:exec [/Dest /for\-deri(1) /Title (for\-deri(1)) /Level 1 /OUT pdfmark x X pdf: markrestart x X ps:exec [/Dest /pdf:bm2 /View [/FitH -57000 u] /DEST pdfmark x X ps:exec [/Dest /pdf:bm2 /Title (Name) /Level 2 /OUT pdfmark x X devtag:.NH 1 x X devtag:.eo.h x X ps:exec [/Dest /pdf:bm3 /View [/FitH -85800 u] /DEST pdfmark x X ps:exec [/Dest /pdf:bm3 /Title (Description) /Level 2 /OUT pdfmark x X devtag:.NH 1 x X devtag:.eo.h x X ps:exec [/Dest /pdf:bm4 /View [/FitH -114600 u] /DEST pdfmark x X ps:exec [/Dest /pdf:bm4 /Title (\[u03B1]\[u03B2]\[u03B3].greek) /Level 2 /OUT pdfmark x X devtag:.NH 1 x X devtag:.eo.h x X pdf: marksuspend ...so the Greek seems to show up fine. Actually, the "grout" output is unchanged from what's on Savannah's HEAD right now. So, that extent, my plans are to _not_ break what you're afraid I'm going to break. I think. Does this illuminate things? Despite this ticket's postponed status, it might end up fixed as part of getting bug #63074 over the finish line. But only as much of it as I need for that purpose. Time will tell if that's the whole enchilada for this ticket. Regards, Branden [1] In case anyone's curious what older _groffs_ did with that... $ (cd build && ../src/roff/groff/tests/device-control-special-character-handling.sh) GNU groff version 1.23.0 x X bogus1: esc to-do\[u1F00] -'"`^\~ x X bogus1: req @%to-do\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a] x X bogus2: esc to-do@[u1F00] -'"`^\~ x X bogus2: req @%to-do@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a] checking X escape sequence, default escape character ...FAILED checking X escape sequence, alternate escape character ...FAILED checking for errors on unsupported special character escapes $ (cd build && ../src/roff/groff/tests/device-control-special-character-handling.sh) GNU groff version 1.22.4 x X bogus1: esc to-do\[u1F00] - x X bogus1: req @%to-do\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]\[`a] x X bogus2: esc to-do@[u1F00] - x X bogus2: req @%to-do@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]@[`a] troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:3: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X troff: <standard input>:6: a special character is invalid within \X checking X escape sequence, default escape character ...FAILED checking X escape sequence, alternate escape character ...FAILED checking for errors on unsupported special character escapes _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?64484> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature