Hi Branden, thank you for having patience with me :) I have joined all the replies here.
On Sun Nov 3, 2024 at 4:36 AM CET, G. Branden Robinson wrote: > At 2024-11-03T03:56:23+0100, onf wrote: > > Ugh, I should have taken more time to reply -- I missed the fact that > > groff doesn't consider DEL a control character. Thanks for the hack, > > it works... > > Glad to hear it! Let me put on a familiar hat and suggest that you use > different terminology, though. > > In *roff, a "control character" is something that the formatter > recognizes as starting a "control line". Using the same term to refer > to properties of characters from the encoding your system uses can lead > to confusion. [...] Agreed. I am aware of troff's concept of control character, I just haven't realized it's called exactly the same as the concept used by ASCII, POSIX ERE character classes, ECMA-48 (iirc) and so on. > [...] > > The reason DEL works is not because it is or isn't a control character, > but because it's a valid input character, like ^B, ^C, and several > others. (Historically, ^G was popular in attempts to avoid the problem > in the next paragraph.) > [...] The reason I tried using a "control" (non-printing) character is because I can be sure it's not going to occur in input, not because I thought groff treats it specially. On Sun Nov 3, 2024 at 3:46 AM CET, G. Branden Robinson wrote: > At 2024-11-03T02:53:14+0100, onf wrote: > > changing the escape character hasn't occured to me, that's clever! > > Unfortunately it doesn't work -- groff won't allow me to set the > > escape character to a control one, > > It does, but it has to be a valid input character. > > groff(7): > On a machine using the ISO 646, 8859, or 10646 character > encodings, invalid input characters are 0x00, 0x08, 0x0B, > 0x0D–0x1F, and 0x80–0x9F. On an EBCDIC host, they are 0x00–0x01, > 0x08, 0x09, 0x0B, 0x0D–0x14, 0x17–0x1F, and 0x30–0x3F. Some of > these code points are used by troff internally, making it non‐ > trivial to extend the program to accept UTF‐8 or other encodings > that use characters from these ranges. > [...] Thank you for pointing to that. I now remember that I had already seen this, but have completely forgotten about it since. Seems I just picked the wrong non-printing characters. On Sun Nov 3, 2024 at 4:19 AM CET, G. Branden Robinson wrote: > At 2024-11-03T03:25:01+0100, onf wrote: > > [...] Adding a string iterator would fix this, although it would make > > my code significantly more complex as I would have to compare the > > strings character by character [...] > > One reason to have the string(/macro/diversion) iterator request is that > as soon as do, we can use it to construct a "string library" macro > package. "string.tmac" seems like a likely name. > > What I envision is removing several of the string-handling requests from > GNU troff and replacing them with macros in "string.tmac". [...] > "string.tmac" would also be a useful place to experiment with things > like: > .strchr > .strrchr > .index > .rindex > .slice (return a substring using Python-esque indexing) > And maybe AWK-like replacement macros: > .sub > .gsub > [...] Yup, haven't realized the possibilities with such an approach. Seems like a great idea! I would suggest replacing strchr with strpbrk though -- it's useful being able to look for more than a single character. Speaking of which, having unicode-capable ctype macros would be quite helpful too. (I have recently written code that would really benefit from an ispunct macro.) > > A problem with this solution is that it's incomplete. It addresses a > > particular issue arrising from troff's usage of macro substitution, > > but doesn't solve the others. For instance, I would still run into > > issues if I tried to compare a literal ' against anything and > > delimited the comparands by the same character, which can happen with > > the proposed iterator mechanism: > > .ie '\\*[ch]'"' \" ... > > We can't verify or refute that claim until the code is in place, but I > expect that you are wrong about this, unless you run the formatter in > AT&T compatibility mode (in which case the syntax `\*[ch]` won't work > anyway). > > info '(groff) Compatibility Mode': > > Normally, GNU 'troff' preserves the interpolation depth in > delimited arguments, but not in compatibility mode. > > .ds xx ' > \w'abc\*(xxdef' > => 168 (normal mode on a terminal device) > => 72def' (compatibility mode on a terminal device) > > > $ groff -b -ww -z > > .ds str "'\" > > .ie '\*[str]'\'' .tm groff: single quote > > .el .tm groff: else > > groff: else > > This fails because `\` does not escape the apostrophe the way you think > it does. `\'` is a special character escape sequence. > [...] > You'll need to do this a slightly different way when attempting to match > a character that happens to be the same as the delimiter in a formatted > output comparison. One layer of indirection will do. > [...] Thanks for taking the time to explain this. I think I had somewhat assumed it works this way, but the fact that substituting in the escape character from a string (as in my first message) is capable of escaping the comparand delimiter really confused me. I still don't get how that's possible if groff preserves the interpolation depth as you say... I will be happy to be wrong about any possible issues, though :) ~ onf