Re: Comparison against backslash obtained via .substring

onf Sun, 03 Nov 2024 00:50:26 -0700

Hi Branden,

thank you for having patience with me :)
I have joined all the replies here.


On Sun Nov 3, 2024 at 4:36 AM CET, G. Branden Robinson wrote:
> At 2024-11-03T03:56:23+0100, onf wrote:
> > Ugh, I should have taken more time to reply -- I missed the fact that
> > groff doesn't consider DEL a control character. Thanks for the hack,
> > it works...
>
> Glad to hear it!  Let me put on a familiar hat and suggest that you use
> different terminology, though.
>
> In *roff, a "control character" is something that the formatter
> recognizes as starting a "control line".  Using the same term to refer
> to properties of characters from the encoding your system uses can lead
> to confusion.  [...]

Agreed. I am aware of troff's concept of control character, I just
haven't realized it's called exactly the same as the concept used by
ASCII, POSIX ERE character classes, ECMA-48 (iirc) and so on.

> [...]
>
> The reason DEL works is not because it is or isn't a control character,
> but because it's a valid input character, like ^B, ^C, and several
> others.  (Historically, ^G was popular in attempts to avoid the problem
> in the next paragraph.)
> [...]

The reason I tried using a "control" (non-printing) character is because
I can be sure it's not going to occur in input, not because I thought
groff treats it specially.

On Sun Nov 3, 2024 at 3:46 AM CET, G. Branden Robinson wrote:
> At 2024-11-03T02:53:14+0100, onf wrote:
> > changing the escape character hasn't occured to me, that's clever!
> > Unfortunately it doesn't work -- groff won't allow me to set the
> > escape character to a control one,
> 
> It does, but it has to be a valid input character.
> 
> groff(7):
>        On a machine using the ISO 646, 8859, or 10646 character
>        encodings, invalid input characters are 0x00, 0x08, 0x0B,
>        0x0D–0x1F, and 0x80–0x9F.  On an EBCDIC host, they are 0x00–0x01,
>        0x08, 0x09, 0x0B, 0x0D–0x14, 0x17–0x1F, and 0x30–0x3F.  Some of
>        these code points are used by troff internally, making it non‐
>        trivial to extend the program to accept UTF‐8 or other encodings
>        that use characters from these ranges.
> [...]

Thank you for pointing to that. I now remember that I had already seen
this, but have completely forgotten about it since. Seems I just picked
the wrong non-printing characters.

On Sun Nov 3, 2024 at 4:19 AM CET, G. Branden Robinson wrote:
> At 2024-11-03T03:25:01+0100, onf wrote:
> > [...] Adding a string iterator would fix this, although it would make
> > my code significantly more complex as I would have to compare the
> > strings character by character [...]
> 
> One reason to have the string(/macro/diversion) iterator request is that
> as soon as do, we can use it to construct a "string library" macro
> package.  "string.tmac" seems like a likely name.
>
> What I envision is removing several of the string-handling requests from
> GNU troff and replacing them with macros in "string.tmac". [...]
> "string.tmac" would also be a useful place to experiment with things
> like:
>       .strchr
>       .strrchr
>       .index
>       .rindex
>       .slice (return a substring using Python-esque indexing)
> And maybe AWK-like replacement macros:
>       .sub
>       .gsub
> [...]

Yup, haven't realized the possibilities with such an approach. Seems
like a great idea! I would suggest replacing strchr with strpbrk though
-- it's useful being able to look for more than a single character.
Speaking of which, having unicode-capable ctype macros would be quite
helpful too. (I have recently written code that would really benefit
from an ispunct macro.)

> > A problem with this solution is that it's incomplete. It addresses a
> > particular issue arrising from troff's usage of macro substitution,
> > but doesn't solve the others. For instance, I would still run into
> > issues if I tried to compare a literal ' against anything and
> > delimited the comparands by the same character, which can happen with
> > the proposed iterator mechanism:
> >   .ie '\\*[ch]'"' \" ...
> 
> We can't verify or refute that claim until the code is in place, but I
> expect that you are wrong about this, unless you run the formatter in
> AT&T compatibility mode (in which case the syntax `\*[ch]` won't work
> anyway).
>
> info '(groff) Compatibility Mode':
> 
>      Normally, GNU 'troff' preserves the interpolation depth in
>   delimited arguments, but not in compatibility mode.
> 
>        .ds xx '
>        \w'abc\*(xxdef'
>            => 168 (normal mode on a terminal device)
>            => 72def' (compatibility mode on a terminal device)
> 
> >   $ groff -b -ww -z
> >   .ds str "'\"
> >   .ie '\*[str]'\'' .tm groff: single quote
> >   .el .tm groff: else
> >   groff: else
> 
> This fails because `\` does not escape the apostrophe the way you think
> it does.  `\'` is a special character escape sequence.
> [...]
> You'll need to do this a slightly different way when attempting to match
> a character that happens to be the same as the delimiter in a formatted
> output comparison.  One layer of indirection will do.
> [...]

Thanks for taking the time to explain this. I think I had somewhat
assumed it works this way, but the fact that substituting in the escape
character from a string (as in my first message) is capable of escaping
the comparand delimiter really confused me. I still don't get how that's
possible if groff preserves the interpolation depth as you say...

I will be happy to be wrong about any possible issues, though :)

~ onf

Re: Comparison against backslash obtained via .substring

Reply via email to