URL: <https://savannah.gnu.org/bugs/?64576>
Summary: [pdf.tmac] pdf*href option handling insufficiently flexible Group: GNU roff Submitter: gbranden Submitted: Mon 21 Aug 2023 09:50:40 AM UTC Category: Macro - others/general Severity: 3 - Normal Item Group: Incorrect behaviour Status: In Progress Privacy: Public Assigned to: gbranden Open/Closed: Open Discussion Lock: Any Planned Release: None _______________________________________________________ Follow-up Comments: ------------------------------------------------------- Date: Mon 21 Aug 2023 09:50:40 AM UTC By: G. Branden Robinson <gbranden> This code: .\" .\" Macros "pdf:href.flag" and "pdf:href.option" .\" provide a generic mechanism for switching on flag type options, .\" and for decoding options with arguments, respectively .\" .de pdf:href.flag .\" ---------------------------------------------------------------------- .\" ---------------------------------------------------------------------- .nr pdf:href\\$1 1 .nr pdf:href.argc 1 .. .de pdf:href.option .\" ---------------------------------------------------------------------- .\" ---------------------------------------------------------------------- .ds pdf:href\\$1 \\$2 .nr pdf:href.argc 2 ...is insufficiently flexible. It assume that its inputs will consist only of ordinary characters, but special characters and escape sequences, particular for callers of `pdf:href.option`, are conceivable. For example, a macro like _groff man_(7)'s `UR`, when used with no link text (which is a bit lazy, but accepted), will run into problems in cases like the following. .P .I ps2eps is available from CTAN mirrors, e.g., .UR ftp://\:ftp\:.dante\:.de/\:tex\-archive/\:support/\:ps2eps/ .UE . That's a real example from our _pic_(1) page. One approach to resolving it implies laboriously walking the arguments to macros that call `pdf:href.flag` and `pdf:href.option` (which are internals--not externally documented and therefore not an API), attempting to scrub them of unexpected content, and getting peevish with other _groff_ developers when encountering arbitrary _roff_ input that is *unexpectedly* unexpected; see, e.g., bug #64202. That it is so tedious to iterate through strings in _groff_ (and as I have said elsewhere, nigh-impossible in AT&T _troff_) is doubtless one of the factors that turns up the temperature on this problem. See bug #62264 for a proposed, but not yet implemented, quality-of-life improvement in this area. Another possibility is simply for _pdf.tmac_- or _pdfmark.tmac_-using documents and macro packages to be aware of the intolerance/irritability of its internals, and work around them--for instance, _groff_'s _an.tmac_, when seeing that a `UR` or `MT` has no link text, could simply inject some known, well-behaved link text like "(link)", that aforementioned internals won't barf on. This works (I tried it), but it is pretty lame. 1. That text isn't localized. 2. That text might not appropriate or clear in all situations. Now, one _could_ kick both of the above back into the user's face. ("Just supply some link text, damn it!") But for another problem... 3. Worst, you can't format punctuation after it without intervening space. To do that, you need the `\c` escape sequence, which becomes part of one of `pdfhref`'s arguments, and _pdfmark.tmac_ / _pdf.tmac_ insist on populating _roff_ register or string names incorporating each such argument, and we're back to the original problem of escape sequences. troff:<standard input>:1473: error: an escaped 'c' is not allowed in an identifier And in fact use of `\c` is wholly defeated here--you'll get space (and possibly a break) before the punctuation anyway. So tossing the burden of specifying link text--which is supposed to be formatted output in the first place--on the user and then going aggro on them if they dare to use escape sequences that are wholly valid in formatted output is not a satisfactory solution. Intriguingly, the `\A` escape sequence to test a character sequence for validity as a _groff_ identifier name has been around since 1991, but _pdfmark.tmac_ and _pdf.tmac_ don't bother to use it. Possibly this problem would have been recognized and addressed long ago if they had. It certainly seems to me like a Recommended Best Practice if one is going to be populating _groff_ identifiers based on user input (or even _any_ external input, like a macro package written by someone who isn't as careful as you are). But nobody ever got a fellowship for validating input, did they? Moreover, it appears that the main reason _pdfmark.tmac_ / _pdf.tmac_ are taking this approach is because the _roff_ language doesn't have a list type, so it's a pain in the ass to search for things. _pdfmark.tmac_ / _pdf.tmac_'s solution, to use the macro/request/string name space as a dictionary, with the identifiers as keys and the string contents as values, does have obvious appeal given that limitation...but for blundering into the other limitations of assuming either that (a) any input makes a valid identifier, or (b) your users won't wander off the lit path of ordinary characters. And as noted above, scrubbing a character sequence for things that are invalid (in _any_ context)--the "sanitiziation problem", is Yet Another pain in the ass. See bug #62264 again. Fortunately, the use of this mechanism, in _pdf.tmac_ at least, appears to be fairly limited. `pdf.href.flag` would seem to be okay, since its values only ever come from macro arguments that identify "flags", and these are going to have straightforward names. For instance, these seem okay (includes annotations from my working copy). 671 .\" XXX: predefined flag 672 .if !dpdf:href-D .pdf:href.option -D \\$1 673 .if '\\*[pdf:href-D]'' \{\ 674 . pdf:error pdfhref has no destination 675 . nr pdf:href.ok 0 676 . \} 690 .\" XXX: predefined flag 691 .if dpdf:href-P \&\\*[pdf:href-P]\c 692 .ie \\n[pdf:href.ok] \{\ 693 . \" [~40 lines of brace scope follow] No, the problem seems to be limited to eating what, on the Unix command line, we'd call operands and option arguments, but which can be URLs with escape sequences like \: and \c in them, and spitting them verbatim into suffixes on _roff_ identifiers, and that just doesn't work in general. 423 . \" 424 . \" Handle the case where subcommand is specified as "-class", 425 . \" setting up appropriate macro aliases for subcommand handlers. 426 . \" 427 .\" XXX 428 . if dpdf*href\\$1 .als pdf*href pdf*href\\$1 429 . if dpdf*href\\$1.link .als pdf*href.link pdf*href\\$1.link 430 . if dpdf*href\\$1.file .als pdf*href.file pdf*href\\$1.file 431 . \" 432 . \" Repeat macro alias setup 433 . \" for the case where the subcommand is specified as "class", 434 . \" (without a leading hyphen) 435 . \" 436 .\" XXX 437 . if dpdf*href-\\$1 .als pdf*href pdf*href-\\$1 438 . if dpdf*href-\\$1.link .als pdf*href.link pdf*href-\\$1.link 439 . if dpdf*href-\\$1.file .als pdf*href.file pdf*href-\\$1.file An immense amount of code in _pdf.tmac_ seems to be dedicated to an exploration of the question "hey, what if we chucked established _roff_ programming idioms out the window and re-implemented _getopt_long_(3) in it so that shell script programmers had macro interfaces that looked vaguely familiar"? _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?64576> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/