Follow-up Comment #28, bug#63074 (group groff): Hi Deri,
[comment #27 comment #27:] > Yes, I want .device to pass everything exactly "as-is" just as it does now, all escapes left untouched. Unfortunately, that is inconsistent with how \X works. $ cat EXPERIMENTS/device-foolery.groff .sp .device ps: \fB .device ps: \s0 .device ps: \- .device ps: \[*A] .device ps: \[u0391] $ ~/groff-1.22.3/bin/groff -Z EXPERIMENTS/device-foolery.groff | grep '^x X' x X ps: \fB x X ps: \s0 x X ps: \- x X ps: \[*A] x X ps: \[u0391] The foregoing output is the same in _groff_ 1.22.4 and 1.23.0. *But*... $ cat EXPERIMENTS/backslash-X-foolery.roff .sp \X'ps: \fB'\ \X'ps: \s0'\ \X'ps: \-'\ \X'ps: \[*A]'\ \X'ps: \[u0391]' $ ~/groff-1.22.3/bin/groff -Z EXPERIMENTS/backslash-X-foolery.roff | grep '^x X' EXPERIMENTS/backslash-X-foolery.roff:4: a special character is invalid within \X EXPERIMENTS/backslash-X-foolery.roff:5: a special character is invalid within \X EXPERIMENTS/backslash-X-foolery.roff:6: a special character is invalid within \X x X ps: x X ps: x X ps: x X ps: x X ps: The foregoing output is the same in _groff_ 1.22.4. In _groff_ 1.23.0, it differs. $ ~/groff-stable/bin/groff -Z EXPERIMENTS/backslash-X-foolery.roff | grep '^x X' x X ps: x X ps: x X ps: - x X ps: x X ps: This was the outcome of bug #61401 (October 2021). I'd like to get `.device` and `\X` treating their arguments the same way, hence bug #64484. You can still pass things representing arbitrary byte sequences to the output driver--but if you want to represent them with _groff_ escape sequences, you will need to further escape them. $ cat EXPERIMENTS/backslash-X-escapery.roff .sp \X'ps: \\fB'\ \X'ps: \\s0'\ \X'ps: \\-'\ \X'ps: \\[*A]'\ \X'ps: \\[u0391]' $ ~/groff-HEAD/bin/groff -Z EXPERIMENTS/backslash-X-escapery.roff | grep '^x X' x X ps: \fB x X ps: \s0 x X ps: \- x X ps: \[*A] x X ps: \[u0391] And `device` request behavior is now (in Git) consistent with this. $ cat ./EXPERIMENTS/device-escapery.groff .sp .device ps: \\fB .device ps: \\s0 .device ps: \\- .device ps: \\[*A] .device ps: \\[u0391] $ ~/groff-HEAD/bin/groff -Z EXPERIMENTS/device-escapery.groff | grep '^x X' x X ps: \fB x X ps: \s0 x X ps: \- x X ps: \[*A] x X ps: \[u0391] Here is what our Texinfo manual now says (in Git) about `device` and `\X`. 5.34 Postprocessor Access ========================= Two escape sequences and two requests enable documents to pass information directly to an output driver or other postprocessor. These are useful for exercising device-specific capabilities that the 'groff' language does not abstract or generalize; examples include the embedding of hyperlinks and image files. Device-specific functions are documented in each output driver's man page, such as 'gropdf(1)', 'grops(1)', or 'grotty(1)'. -- Request: .device xxx ... -- Escape sequence: \X'''xxx ...''' Embed all XXX arguments into GNU 'troff' output as parameters to an 'x X' device control command.(1) (*note Postprocessor Access-Footnote-1::) The meaning and interpretation of such parameters is determined by the output driver or other postprocessor. The 'device' request strips an initial neutral double quote from CONTENTS to allow embedding of leading spaces. Within a device control command, the escape sequences '\&', '\)', '\%', and '\:' are ignored; '\<SPC>' and '\~' are converted to single space characters; and '\\' has its escape character stripped. So that the basic Latin subset of the Unicode character set(2) (*note Postprocessor Access-Footnote-2::) can be reliably encoded in device control commands, seven special character escape sequences ('\-', '\[aq]', '\[dq]', '\[ga]', '\[ha]', '\[rs]', and '\[ti]') are mapped to basic Latin characters; see the 'groff_char(7)' man page. For this transformation, character translations and special character definitions are ignored.(3) (*note Postprocessor Access-Footnote-3::) Escape sequences other than the foregoing in device control command may be ignored, or produce an error. A device control command issued with the 'device' request will not be reflected in the output unless a partially collected line exists at least once in the top-level diversion (recall *note Diversions::). When experimenting with such device controls in minimal documents, a 'br' request will ensure this to be the case. If the 'use_charnames_in_special' directive appears in the output device's 'DESC' file, the use of special character escape sequences is _not_ an error; they are simply output verbatim (with the exception of the seven mapped to Unicode basic Latin characters, discussed above). 'use_charnames_in_special' is currently employed only by 'grohtml'. [.devicem and \Y snipped] And since I changed `device` to stop readings its argument (there's really only one, scanned until the end of the line, line `tm`), here is the "NEWS" entry. o The `device` request no longer reads its arguments in copy mode; this change makes it more consistent with the `\X` device control command escape sequence. This request also no longer emits a self-quoted *roff escape character as itself, but instead as a backslash. (troff's input and output languges are not the same thing.) These changes are to enable postprocessors to reliably interpret device control commands that wish to express arbitrary byte sequences. For example, PDF bookmarks need to be expressed in UTF-16LE. ...but I perceive that (1) I should probably break that up into 2 items, and (2) I should add diagnostics to `encode_char_for_troff_input()` when it encounters something incomprehensible like a font selection or type size escape sequence. This part of the discussion is pretty much independent of to-stringhex-or-not-to-stringhex (as you've been trying to tell me), so I will reply to that separately. If you have comments on the foregoing, please direct them to bug #64484. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?63074> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/