Hi Philip,
Thanks for the quick response. I certainly wasn't expecting to find an
ancient bug like this. Should I be reporting this bug upstream, or are
you planning on upstreaming a diff?
Regards,
Jordan
On 2020-06-06 20:16, Philip Guenther wrote:
On Sat, Jun 6, 2020 at 5:08 PM Zé Loff <zel...@zeloff.org
<mailto:zel...@zeloff.org>> wrote:
On Sat, Jun 06, 2020 at 03:51:58PM -0700, Jordan Geoghegan wrote:
> I'm working on a simple awk snippet to convert the IP range data
listed in
> the Extended Delegation Statistics data from ARIN [1] and
convert it into
> CIDR blocks. I have a snippet that works perfectly fine on mawk
and gawk,
> but not on the base system awk. I'm 99% sure I'm not using any
GNUisms, as
> when I break the command up into two parts, it works perfectly.
>
> The snippet below does not work with base awk, but does work
with gawk and
> mawk: (Running on 6.6 -stable system)
>
> awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
printf("%s/%d\n", $4,
> 32-log($5)/log(2))}' delegated-arin-extended-latest.txt
>
>
> The command does output data, but it also throws errors for
certain lines:
>
> awk: log result out of range
> input record number 94027, file delegated-arin-extended-latest.txt
> source line number 1
>
> Most CIDR blocks are calculated correctly, but about 10% of them
have errors
> (ie something that should calculated to be a /24 is instead
calculated to be
> a /30).
...
I have no idea about what is going on, but FWIW I can reproduce
this on
i386 6.7-stable and amd64 6.7-current (well, current-ish, #232).
Truncating the file to a single offending line produces the same
result:
log($5) is out of range.
It appears to have something to do with the last field. Removing it or
changing some of its characters seems to work, e.g.:
arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5e58386636aa775c2106140445cf2c30
arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5a58386636aa775c2106140445cf2c30
^
Fails on the first line but works on the second.
Hah! Nice observation!
The last field of the first line looks kinda like a number in
scientific notation, but when awk internally tries to set up the
fields it generates an ERANGE error...and the global errno variable is
left with that value. Several builtins in awk, including log(),
perform operations and then check whether errno is set to EDOM or
ERANGE but fail to clear errno beforehand.
The fix is to zero errno before all the code sequences that use the
errcheck() function, ala:
--- run.c 13 Aug 2019 10:45:56 -0000 1.44
+++ run.c 7 Jun 2020 03:14:38 -0000
@@ -26,6 +26,7 @@ THIS SOFTWARE.
#define DEBUG
#include <stdio.h>
#include <ctype.h>
+#include <errno.h>
#include <setjmp.h>
#include <limits.h>
#include <math.h>
@@ -1041,8 +1042,10 @@ Cell *arith(Node **a, int n) /* a[0] + a
case POWER:
if (j >= 0 && modf(j, &v) == 0.0) /* pos integer
exponent */
i = ipow(i, (int) j);
- else
+ else {
+ errno = 0;
i = errcheck(pow(i, j), "pow");
+ }
break;
default: /* can't happen */
FATAL("illegal arithmetic operator %d", n);
@@ -1135,8 +1138,10 @@ Cell *assign(Node **a, int n) /* a[0] =
case POWEQ:
if (yf >= 0 && modf(yf, &v) == 0.0) /* pos integer
exponent */
xf = ipow(xf, (int) yf);
- else
+ else {
+ errno = 0;
xf = errcheck(pow(xf, yf), "pow");
+ }
break;
default:
FATAL("illegal assignment operator %d", n);
@@ -1499,12 +1504,15 @@ Cell *bltin(Node **a, int n) /* builtin
u = strlen(getsval(x));
break;
case FLOG:
+ errno = 0;
u = errcheck(log(getfval(x)), "log"); break;
case FINT:
modf(getfval(x), &u); break;
case FEXP:
+ errno = 0;
u = errcheck(exp(getfval(x)), "exp"); break;
case FSQRT:
+ errno = 0;
u = errcheck(sqrt(getfval(x)), "sqrt"); break;
case FSIN:
u = sin(getfval(x)); break;
Todd, are we up to date with upstream, or is this latent there too?
Philip Guenther