[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

G. Branden Robinson Tue, 16 Jan 2024 06:54:17 -0800

Follow-up Comment #28, bug#63074 (group groff):

Hi Deri,


[comment #27 comment #27:]
> Yes, I want .device to pass everything exactly "as-is" just as it does now,
all escapes left untouched.

Unfortunately, that is inconsistent with how \X works.


$ cat EXPERIMENTS/device-foolery.groff 
.sp
.device ps: \fB
.device ps: \s0
.device ps: \-
.device ps: \[*A]
.device ps: \[u0391]
$ ~/groff-1.22.3/bin/groff -Z EXPERIMENTS/device-foolery.groff | grep '^x X'
x X ps: \fB
x X ps: \s0
x X ps: \-
x X ps: \[*A]
x X ps: \[u0391]


The foregoing output is the same in _groff_ 1.22.4 and 1.23.0.  *But*...


$ cat EXPERIMENTS/backslash-X-foolery.roff 
.sp
\X'ps: \fB'\
\X'ps: \s0'\
\X'ps: \-'\
\X'ps: \[*A]'\
\X'ps: \[u0391]'
$ ~/groff-1.22.3/bin/groff -Z EXPERIMENTS/backslash-X-foolery.roff | grep '^x
X'
EXPERIMENTS/backslash-X-foolery.roff:4: a special character is invalid within
\X
EXPERIMENTS/backslash-X-foolery.roff:5: a special character is invalid within
\X
EXPERIMENTS/backslash-X-foolery.roff:6: a special character is invalid within
\X
x X ps: 
x X ps: 
x X ps: 
x X ps: 
x X ps: 


The foregoing output is the same in _groff_ 1.22.4.  In _groff_ 1.23.0, it
differs.


$ ~/groff-stable/bin/groff -Z EXPERIMENTS/backslash-X-foolery.roff | grep '^x
X'
x X ps: 
x X ps: 
x X ps: -
x X ps: 
x X ps: 


This was the outcome of bug #61401 (October 2021).

I'd like to get `.device` and `\X` treating their arguments the same way,
hence bug #64484.  You can still pass things representing arbitrary byte
sequences to the output driver--but if you want to represent them with _groff_
escape sequences, you will need to further escape them.


$ cat EXPERIMENTS/backslash-X-escapery.roff
.sp
\X'ps: \\fB'\
\X'ps: \\s0'\
\X'ps: \\-'\
\X'ps: \\[*A]'\
\X'ps: \\[u0391]'
$ ~/groff-HEAD/bin/groff -Z EXPERIMENTS/backslash-X-escapery.roff | grep '^x
X'
x X ps: \fB
x X ps: \s0
x X ps: \-
x X ps: \[*A]
x X ps: \[u0391]


And `device` request behavior is now (in Git) consistent with this.


$ cat ./EXPERIMENTS/device-escapery.groff
.sp
.device ps: \\fB
.device ps: \\s0
.device ps: \\-
.device ps: \\[*A]
.device ps: \\[u0391]
$ ~/groff-HEAD/bin/groff -Z EXPERIMENTS/device-escapery.groff | grep '^x X'
x X ps: \fB
x X ps: \s0
x X ps: \-
x X ps: \[*A]
x X ps: \[u0391]


Here is what our Texinfo manual now says (in Git) about `device` and `\X`.


5.34 Postprocessor Access
=========================

Two escape sequences and two requests enable documents to pass
information directly to an output driver or other postprocessor.  These
are useful for exercising device-specific capabilities that the 'groff'
language does not abstract or generalize; examples include the embedding
of hyperlinks and image files.  Device-specific functions are documented
in each output driver's man page, such as 'gropdf(1)', 'grops(1)', or
'grotty(1)'.

 -- Request: .device xxx ...
 -- Escape sequence: \X'''xxx ...'''
     Embed all XXX arguments into GNU 'troff' output as parameters to an
     'x X' device control command.(1)  (*note Postprocessor
     Access-Footnote-1::) The meaning and interpretation of such
     parameters is determined by the output driver or other
     postprocessor.

     The 'device' request strips an initial neutral double quote from
     CONTENTS to allow embedding of leading spaces.

     Within a device control command, the escape sequences '\&', '\)',
     '\%', and '\:' are ignored; '\<SPC>' and '\~' are converted to
     single space characters; and '\\' has its escape character
     stripped.  So that the basic Latin subset of the Unicode character
     set(2) (*note Postprocessor Access-Footnote-2::) can be reliably
     encoded in device control commands, seven special character escape
     sequences ('\-', '\[aq]', '\[dq]', '\[ga]', '\[ha]', '\[rs]', and
     '\[ti]') are mapped to basic Latin characters; see the
     'groff_char(7)' man page.  For this transformation, character
     translations and special character definitions are ignored.(3)
     (*note Postprocessor Access-Footnote-3::)

     Escape sequences other than the foregoing in device control command
     may be ignored, or produce an error.

     A device control command issued with the 'device' request will not
     be reflected in the output unless a partially collected line exists
     at least once in the top-level diversion (recall *note
     Diversions::).  When experimenting with such device controls in
     minimal documents, a 'br' request will ensure this to be the case.

     If the 'use_charnames_in_special' directive appears in the output
     device's 'DESC' file, the use of special character escape sequences
     is _not_ an error; they are simply output verbatim (with the
     exception of the seven mapped to Unicode basic Latin characters,
     discussed above).  'use_charnames_in_special' is currently employed
     only by 'grohtml'.

[.devicem and \Y snipped]


And since I changed `device` to stop readings its argument (there's really
only one, scanned until the end of the line, line `tm`), here is the "NEWS"
entry.


o The `device` request no longer reads its arguments in copy mode; this 
  change makes it more consistent with the `\X` device control command  
  escape sequence.  This request also no longer emits a self-quoted     
  *roff escape character as itself, but instead as a backslash.
  (troff's input and output languges are not the same thing.)  These    
  changes are to enable postprocessors to reliably interpret device     
  control commands that wish to express arbitrary byte sequences.  For  
  example, PDF bookmarks need to be expressed in UTF-16LE.


...but I perceive that (1) I should probably break that up into 2 items, and
(2) I should add diagnostics to `encode_char_for_troff_input()` when it
encounters something incomprehensible like a font selection or type size
escape sequence.

This part of the discussion is pretty much independent of
to-stringhex-or-not-to-stringhex (as you've been trying to tell me), so I will
reply to that separately.  If you have comments on the foregoing, please
direct them to bug #64484.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63074>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

Reply via email to