Hi Philip,

Thanks for the quick response. I certainly wasn't expecting to find an ancient bug like this. Should I be reporting this bug upstream, or are you planning on upstreaming a diff?

Regards,

Jordan



On 2020-06-06 20:16, Philip Guenther wrote:
On Sat, Jun 6, 2020 at 5:08 PM Zé Loff <zel...@zeloff.org <mailto:zel...@zeloff.org>> wrote:

    On Sat, Jun 06, 2020 at 03:51:58PM -0700, Jordan Geoghegan wrote:
    > I'm working on a simple awk snippet to convert the IP range data
    listed in
    > the Extended Delegation Statistics data from ARIN [1] and
    convert it into
    > CIDR blocks. I have a snippet that works perfectly fine on mawk
    and gawk,
    > but not on the base system awk. I'm 99% sure I'm not using any
    GNUisms, as
    > when I break the command up into two parts, it works perfectly.
    >
    > The snippet below does not work with base awk, but does work
    with gawk and
    > mawk: (Running on 6.6 -stable system)
    >
    >   awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
    printf("%s/%d\n", $4,
    > 32-log($5)/log(2))}' delegated-arin-extended-latest.txt
    >
    >
    > The command does output data, but it also throws errors for
    certain lines:
    >
    >   awk: log result out of range
    >   input record number 94027, file delegated-arin-extended-latest.txt
    >   source line number 1
    >
    > Most CIDR blocks are calculated correctly, but about 10% of them
    have errors
    > (ie something that should calculated to be a /24 is instead
    calculated to be
    > a /30).

...

    I have no idea about what is going on, but FWIW I can reproduce
    this on
    i386 6.7-stable and amd64 6.7-current (well, current-ish, #232).
    Truncating the file to a single offending line produces the same
    result:
    log($5) is out of range.

    It appears to have something to do with the last field. Removing it or
    changing some of its characters seems to work, e.g.:

    
arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5e58386636aa775c2106140445cf2c30
    
arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5a58386636aa775c2106140445cf2c30
                                                        ^
    Fails on the first line but works on the second.


Hah!  Nice observation!

The last field of the first line looks kinda like a number in scientific notation, but when awk internally tries to set up the fields it generates an ERANGE error...and the global errno variable is left with that value.  Several builtins in awk, including log(), perform operations and then check whether errno is set to EDOM or ERANGE but fail to clear errno beforehand.

The fix is to zero errno before all the code sequences that use the errcheck() function, ala:

--- run.c       13 Aug 2019 10:45:56 -0000      1.44
+++ run.c       7 Jun 2020 03:14:38 -0000
@@ -26,6 +26,7 @@ THIS SOFTWARE.
 #define DEBUG
 #include <stdio.h>
 #include <ctype.h>
+#include <errno.h>
 #include <setjmp.h>
 #include <limits.h>
 #include <math.h>
@@ -1041,8 +1042,10 @@ Cell *arith(Node **a, int n)     /* a[0] + a
        case POWER:
                if (j >= 0 && modf(j, &v) == 0.0)       /* pos integer exponent */
                        i = ipow(i, (int) j);
-               else
+               else {
+                       errno = 0;
                        i = errcheck(pow(i, j), "pow");
+               }
                break;
        default:        /* can't happen */
                FATAL("illegal arithmetic operator %d", n);
@@ -1135,8 +1138,10 @@ Cell *assign(Node **a, int n)    /* a[0] =
        case POWEQ:
                if (yf >= 0 && modf(yf, &v) == 0.0)     /* pos integer exponent */
                        xf = ipow(xf, (int) yf);
-               else
+               else {
+                       errno = 0;
                        xf = errcheck(pow(xf, yf), "pow");
+               }
                break;
        default:
                FATAL("illegal assignment operator %d", n);
@@ -1499,12 +1504,15 @@ Cell *bltin(Node **a, int n)    /* builtin
                        u = strlen(getsval(x));
                break;
        case FLOG:
+               errno = 0;
                u = errcheck(log(getfval(x)), "log"); break;
        case FINT:
                modf(getfval(x), &u); break;
        case FEXP:
+               errno = 0;
                u = errcheck(exp(getfval(x)), "exp"); break;
        case FSQRT:
+               errno = 0;
                u = errcheck(sqrt(getfval(x)), "sqrt"); break;
        case FSIN:
                u = sin(getfval(x)); break;


Todd, are we up to date with upstream, or is this latent there too?


Philip Guenther


Reply via email to