On Mon, 13 Jul 2020 13:02:44 +0200, Jan Stary wrote:

> This is current/amd64.
>
> On UTF input, awk segfaults when using a multi-character RS:
>
> $ cat /tmp/in
> č
>
> $ hexdump -C /tmp/in
> 00000000  c4 8d 0a                                        |...|
> 00000003
>
> $ cat /tmp/in | awk '{print$1}'
> č
>
> $ cat /tmp/in | awk -v RS=x '{print$1}'
> č
>
> $ cat /tmp/in | awk -v RS=xy '{print$1}'
> Segmentation fault (core dumped)

Nice catch.  The actual bug is caused by using a signed char as an
index into an array, resulting in a negative index.  Once debugged,
the fix is simple.

 - todd

diff --git a/b.c b/b.c
index c167b50..f7fbc0e 100644
--- a/b.c
+++ b/b.c
@@ -684,7 +684,7 @@ bool fnematch(fa *pfa, FILE *f, char **pbuf, int *pbufsize, 
int quantum)
                                                FATAL("stream '%.30s...' too 
long", buf);
                                buf[k++] = (c = getc(f)) != EOF ? c : 0;
                        }
-                       c = buf[j];
+                       c = (unsigned char)buf[j];
                        /* assert(c < NCHARS); */
 
                        if ((ns = pfa->gototab[s][c]) != 0)

Reply via email to