signalling NaNs on i386 and x86_64

Bruno Haible Sat, 14 Oct 2023 05:54:24 -0700

In <https://lists.gnu.org/archive/html/bug-gnulib/2023-10/msg00051.html>
I wrote:


      - On i386 platforms (this includes x86_64 platforms with "gcc -m32"),
        tracking and trapping of floating-point exceptions *should* work,
        according to the Intel documentation. But it doesn't do so reliably.
        Depending on gcc optimizations, it sometimes works and sometimes not.
        When compiling without gcc optimizations, tracking and trapping of
        floating-point exceptions appears to work for 'long double' operations,
        but not for 'float' and 'double' operations.

        I observe this across different OSes and across CPU vendors (AMD and
        Intel). I tried adding sync instructions ("cpuid" or "fwait"); that
        did not help. I also tried the gcc options -mfpmath=387 and
        -mfpmath=sse; that did not help either.

        It looks like the Intel 'fld' instruction, when loading a 32-bit or
        64-bit SNaN into a register, converts it to an 80-bit QNaN without
        setting the Invalid Operation flag in the fstat register.

        Is there a way to get this working?

Meanwhile, here's my understanding of the issue.

Table of contents:
  I. Problems at instruction level
  II. Problems at C level
  III. Workarounds at C level
  IV. Consequences


Problems at instruction level
=============================

The i386 and x86_64 processors come with two floating-point units: the 387
unit and the SSE unit [1]. GCC for i386, by default, uses only the 387 unit.
(In order to use the SSE unit, one needs to pass the compiler options
'-mfpmath=sse -march=pentium4').

In order to get a floating-point value into the 387 unit, the compiler needs
to emit an 'fld' instruction. [2] This instruction, when used to load a
floating-point value from memory into the 387 unit, comes in three flavours:
  - flds, loads a 32-bit floating-point value (IEEE 754 single-precision)
  - fldl, loads a 64-bit floating-point value (IEEE 754 double-precision)
  - fldt, loads a 80-bit floating-point value (Intel "extended precision")

In the cases 'flds' and 'fldl', the instruction converts a signalling NaN
value to a quiet NaN value in the 80-bit format. [3] And it sets the
"Invalid Operand" in the fstat (floating-point status) register [4].

I observe this directly with gdb:
  - for flds:
    The value 0xff800001 (an SNaN) in memory
    becomes in the FPU:  R7 = Special 0xffffc000010000000000 QNaN,
    and fstat bit 0 gets set.
  - for fldl:
    The value 0xfff0000000100000 (an SNaN) in memory
    becomes in the FPU:  R7 = Special 0xffffc000000080000000 QNaN,
    and fstat bit 0 gets set.
  - for fldt:
    The value 0xffff8000000100000000 (an SNaN) in memory
    becomes in the FPU:  R7 = Special 0xffff8000000100000000 SNaN,
    and fstat is unchanged.

This is not what IEEE 754 has specified. IEEE 754 has specified that
floating-point *operations* (like addition, multiplication) should
recognize a signalling NaN, set the "Invalid operation" exception flag,
and — if no trap handler for this exception is installed — produce a
quiet NaN as result of the operation. [5][6]


Problems at C level
===================

1) Since the ABI often specifies that a function's return value,
if of type 'float', 'double', or 'long double', is returned in a
floating-point register, functions that return 'float' or 'double'
can never return SNaNs. They always return QNaNs. This includes,
in particular, the functions 'construct_SNaNf' and 'construct_SNaNd'
in snan.h !

2) Where the programmer had foreseen an "Invalid operation" exception
to occur in one place (namely a floating-point operations), it
actually occurs before that point, namely at the place where the
operand it loaded into the FPU — which is at the compiler's
discretion. Thus the resulting behaviour is highly dependent on
the compiler optimization level. The use of
  fetestexcept (FE_INVALID)
is therefore unreliable on i386 and x86_64, when it comes to SNaNs.


Workarounds at C level
======================

Passing around a value of type
  union { double value; uint64_t word; }
does preserve SNaNs, because
  - this type does not force a return value into an FPU register,
  - when copying such a value between memory locations, the compiler
    will not use an 'fld' instruction.

Find attached two programs totalorder-snan-via-double.c and
totalorder-snan-via-union.c, that implement a unit test for the
totalorder() function. Once with 'double' as return value, the other
one with a union type as return value.

Here are the results (with gcc 11 and clang 16):


x86_64:

$ gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -lm && ./a.out
$ gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -O2 -lm && ./a.out
$ gcc -Wall totalorder-snan-via-union.c -lm && ./a.out
$ gcc -Wall totalorder-snan-via-union.c -O2 -lm && ./a.out
$ gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -lm && ./a.out
Failed: i=0 j=1
Failed: i=1 j=0
Failed: i=12 j=13
Failed: i=13 j=12
$ gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -O2 -lm && ./a.out
$ gcc -Wall totalorder-snan-via-double.c -lm && ./a.out
$ gcc -Wall totalorder-snan-via-double.c -O2 -lm && ./a.out

$ clang -Wall totalorder-snan-via-union.c -lm && ./a.out
$ clang -Wall totalorder-snan-via-union.c -O2 -lm && ./a.out
$ clang -Wall totalorder-snan-via-double.c -lm && ./a.out
$ clang -Wall totalorder-snan-via-double.c -O2 -lm && ./a.out

i386:

$ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -lm && 
./a.out
$ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -O2 -lm && 
./a.out
$ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=sse 
-march=pentium4 -lm && ./a.out
$ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=sse 
-march=pentium4 -O2 -lm && ./a.out
$ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -lm && 
./a.out
Failed: i=0 j=1
Failed: i=1 j=0
Failed: i=12 j=13
Failed: i=13 j=12
$ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -O2 -lm && 
./a.out
$ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=sse 
-march=pentium4 -lm && ./a.out
Failed: i=0 j=1
Failed: i=1 j=0
Failed: i=12 j=13
Failed: i=13 j=12
$ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=sse 
-march=pentium4 -O2 -lm && ./a.out

$ clang -m32 -Wall totalorder-snan-via-union.c -lm && ./a.out
$ clang -m32 -Wall totalorder-snan-via-union.c -O2 -lm && ./a.out
$ clang -m32 -Wall totalorder-snan-via-union.c -mfpmath=sse -march=pentium4 -lm 
&& ./a.out
$ clang -m32 -Wall totalorder-snan-via-union.c -mfpmath=sse -march=pentium4 -O2 
-lm && ./a.out
$ clang -m32 -Wall totalorder-snan-via-double.c -lm && ./a.out
Failed: i=0 j=1
Failed: i=1 j=0
Failed: i=12 j=13
Failed: i=13 j=12
$ clang -m32 -Wall totalorder-snan-via-double.c -O2 -lm && ./a.out
$ clang -m32 -Wall totalorder-snan-via-double.c -mfpmath=sse -march=pentium4 
-lm && ./a.out
Failed: i=0 j=1
Failed: i=1 j=0
Failed: i=12 j=13
Failed: i=13 j=12
$ clang -m32 -Wall totalorder-snan-via-double.c -mfpmath=sse -march=pentium4 
-O2 -lm && ./a.out

What you can see, is:
  - The totalorder-snan-via-union.c program always works fine, regardless of
    CPU type, compiler, optimization levels.
  - The totalorder-snan-via-double.c program malfunctions when the compiler
    uses 387 instructions (as opposed to SSE (MMX) instructions) _and_ the
    compiler optimizations disallowed inlining of the 'construct_signalling_nan'
    function.


Consequences
============

The consequences are different, according to the type of program.

* Programs that don't need to distinguish QNaN and SNaN:
  Such programs work on i386 and x86_64, like on other CPUs.
  No special precautions are needed.
  Examples: The isfinite() or *printf unit tests.

* Programs that need to distinguish QNaN and SNaN, but don't do
  floating-point operations:
  In these cases it is sufficient to
    - either pass values by reference, not by value,
    - or apply the 'union' workaround.
  Example: The totalorder() unit tests.

* Programs that need to distinguish QNaN and SNaN, and that do
  floating-point operations.
  In these cases the 'union' workaround is typically too clumsy.
  So, for this case, there is typically no good workaround.
  Example: Code that wants to rely on the FE_INVALID flag.


[1] https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/x86-Options.html
[2] "Intel 64 and IA-32 Architectures Software Developer's Manual"
    volume 1, section 8.3.3 Data Transfer Instructions
[3] https://stackoverflow.com/questions/61024467/
[4] "Intel 64 and IA-32 Architectures Software Developer's Manual"
    volume 2, section 3.2 FLD
[5] https://en.wikipedia.org/wiki/IEEE_754#Exception_handling
[6] https://en.wikipedia.org/wiki/NaN#Signaling_NaN

#define _GNU_SOURCE 1
#include <math.h>
#include <stdint.h>
#include <stdio.h>

volatile double zero = 0.0;
volatile double one = 1.0;
volatile double minus_zero = -0.0;

double positive_quiet_nan ()
{
  volatile double n = zero / zero;
  return signbit (n) ? -n : n;
}

double negative_quiet_nan ()
{
  volatile double n = zero / zero;
  return signbit (n) ? n : -n;
}

/* Assume IEEE 754 double-precision
   <https://en.wikipedia.org/wiki/Double-precision_floating-point_format> */
typedef union { double value; uint64_t word; } memory_double;

double construct_signalling_nan (double quiet_value)
{
  memory_double m;
  m.value = quiet_value;
  /* Turn the quiet NaN into a signalling NaN.  */
  m.word ^= (uint64_t) 1 << 51;
  /* Set some arbitrary mantissa bit.  */
  m.word |= (uint64_t) 1 << 19;
  return m.value;
}

int
main ()
{
  double x[] =
    {
      negative_quiet_nan (),
      construct_signalling_nan (negative_quiet_nan ()),
      - one / zero,
      -1e37,
      -1,
      -1e-5,
      minus_zero,
      zero,
      1e-5,
      1,
      1e37,
      one / zero,
      construct_signalling_nan (positive_quiet_nan ()),
      positive_quiet_nan ()
    };
  int result = 0;

  for (int i = 0; i < 14; i++)
    for (int j = 0; j < 14; j++)
      if (!(!!totalorder (&x[i], &x[j]) == (i <= j)))
        {
          fprintf (stderr, "Failed: i=%d j=%d\n", i, j);
          result = 1;
        }

  return result;
}

/*
 * i686-linux-gnu-gcc-11 -ggdb -Wall totalorder-snan-via-double.c -lm -o totalorder-snan-via-double
 */

#define _GNU_SOURCE 1
#include <math.h>
#include <stdint.h>
#include <stdio.h>

volatile double zero = 0.0;
volatile double one = 1.0;
volatile double minus_zero = -0.0;

double positive_quiet_nan ()
{
  volatile double n = zero / zero;
  return signbit (n) ? -n : n;
}

double negative_quiet_nan ()
{
  volatile double n = zero / zero;
  return signbit (n) ? n : -n;
}

/* Assume IEEE 754 double-precision
   <https://en.wikipedia.org/wiki/Double-precision_floating-point_format> */
typedef union { double value; uint64_t word; } memory_double;

memory_double construct_signalling_nan (double quiet_value)
{
  memory_double m;
  m.value = quiet_value;
  /* Turn the quiet NaN into a signalling NaN.  */
  m.word ^= (uint64_t) 1 << 51;
  /* Set some arbitrary mantissa bit.  */
  m.word |= (uint64_t) 1 << 19;
  return m;
}

int
main ()
{
  memory_double x[] =
    {
      { negative_quiet_nan () },
      construct_signalling_nan (negative_quiet_nan ()),
      { - one / zero },
      { -1e37 },
      { -1 },
      { -1e-5 },
      { minus_zero },
      { zero },
      { 1e-5 },
      { 1 },
      { 1e37 },
      { one / zero },
      construct_signalling_nan (positive_quiet_nan ()),
      { positive_quiet_nan () }
    };
  int result = 0;

  for (int i = 0; i < 14; i++)
    for (int j = 0; j < 14; j++)
      if (!(!!totalorder (&x[i].value, &x[j].value) == (i <= j)))
        {
          fprintf (stderr, "Failed: i=%d j=%d\n", i, j);
          result = 1;
        }

  return result;
}

/*
 * i686-linux-gnu-gcc-11 -ggdb -Wall totalorder-snan-via-union.c -lm -o totalorder-snan-via-union
 */

signalling NaNs on i386 and x86_64

Reply via email to