Follow-up Comment #25, bug#63074 (group groff): [comment #24 comment #24:] > Bug #64484 is marked as fixed.
Right, but I believe there was a relationship nevertheless. > I already have a reliable way to pass byte sequences in device control commands, .stringhex. Okay. But it didn't do anything about this failing test case (which admittedly didn't exist until I started to research this issue). https://git.savannah.gnu.org/cgit/groff.git/diff/src/roff/groff/tests/device-control-special-character-handling.sh?id=974c063f0a9e1ef6c0d2cac4755a3b9d6e925b0d Of which the salient part is the actual test input: input='.nf \X#bogus1: esc \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]# .device bogus1: req \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti] .ec @ @X#bogus2: esc @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]# .device bogus2: req @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]' ...which looks pretty noisy but tests several things. 1. Use of \X escape sequences versus `device` requests. 2. Use of \% escape sequences in device control commands (do they get removed?). 3. Use of ordinary hyphens in device control commands (do they get converted to some crazy Unicode thing?). 4. Use of special character escape sequences to represent ASCII characters in device control commands and which should therefore be passed through as ASCII. 5. Robustness in the face of a changed roff escape character. This did *not* work prior to the bug #64484 fix. > This bug was previously named "warning messages when using special characters in TITLE or AUTHOR" and the attached cyrillic.pdf shows both the pdf title and author shown with cyrillics and no warnings. So I would say this one is dependent on bug #65098, i.e. merge the rest of my branch. I hear your expression of urgency but I don't think "stringhex" is good long-term solution to what ails us. You are correct in comment #22 that I did not correctly apprehend at first what it was for. I thought you developed it because we had no way to reliably transmit arbitrary byte sequences to device control commands. But we did, sort of--it just needed to be made consistent and reliable. That it wasn't is what my test case attempts to illustrate and what the fix to bug #64484 attempts to prove. No, I accept your premise that the main driver behind "stringhex" was this: > The problem lies in the original pdfmark API, if you look at the pdfmark.pdf you will see that in the sections describing .pdfhref M and .pdfhref L which both refer to a "dest-name" and "descriptive text", it says that if a dest-name is not given the first word in the description is used as the dest-name. I appreciate your explanation. If the problem was with the pdfmark API, then let's fix the pdfmark API. In particular, this: > if a dest-name is not given the first word in the description is used as the dest-name ...strikes me a short-sighted, especially without any validation going on. A textual description of a hyperlink/bookmark might contain all sorts of crazy stuff. (Like Cyrillic or CJK characters or, worse, motion or type-size or font-selection escape sequences.) Assuming that it was going to be a well-behaved sequence of ASCII bytes or even that one could "sanitize" or "cln" one's way through was a hopeless notion. That won't be practical until we have a string iterator and more conditional expressions that enable the user of an iterator to identify the type of each item in an iterated string/macro/diversion. But if I understand you correctly, we don't need that fancy new stuff to solve the present problem, with stringhex or without. It would probably benefit me to look up Peter's documentation on _mom_'s "HEADING" macro. It is a bit baffling to me that one has to repeat arguments like this: .HEADING 1 NAMED Гуляйпольщина "Гуляйпольщина" ... .PDF_LINK Гуляйпольщина PREFIX ( SUFFIX ) "see: +" > Where the "+" is replaced by the contents of the string register pdf:look(Гуляйпольщина), which would actually be a string of \[uXXXX] nodes, so would generate an error. This is what stringhex is for, to hide the contents so that groff does not see it as a sequence of nodes. The ideal solution would be to allow string registers to have an attribute (say "glass") which signals that groff should never try to interpret its contents, i.e. operate as if the escape mechanism was turned off just for the contents of that register, and have a way of turning that attribute on/off or an escape which sets the attribute for the enclosed string. Right now I don't understand why we would need to elaborate a fairly fundamental *roff language data type (the string) with a "glass" attribute when, if you have a list indexed by a number or a _valid_ identifier, you can simply define a string using a list item's index as a prefix. .nr refno 1 .de DEFREF . nr refno +1 . ds ref*id!\n[refno]!tag \\$1 . ds ref*id!\n[refno]!author \\$2 . ds ref*id!\n[refno]!desc \\$3 . ds ref*id!\n[refno]!year \\$4 .. .DEFREF story "Dupr\[e aa]" "Best \%Story\%Book Ever" 1989 That's a simplified example of how macro packages have been implementing arrays of data structures for decades, complete with idioms for "*" and "!", which are not imposed by the language in any way. Maybe I'm missing something. As it happens, this bug is probably fixed, too--I simply need to come up with a convincing acceptance criterion for it. A bit tough without adding a feature to an existing output driver. I trust it's obvious that, with appropriate escaping, one can transmit "\000\001..\377" or "\x00..\xff" or "\[u0000]..\[u00FF]". I will try to make some time to reply to comment #22 more thoughtfully soon. Leaving in "Need Info" status and assigned to myself for that reason. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?63074> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/