In <https://lists.gnu.org/archive/html/bug-gnulib/2023-10/msg00051.html> I wrote:
- On i386 platforms (this includes x86_64 platforms with "gcc -m32"), tracking and trapping of floating-point exceptions *should* work, according to the Intel documentation. But it doesn't do so reliably. Depending on gcc optimizations, it sometimes works and sometimes not. When compiling without gcc optimizations, tracking and trapping of floating-point exceptions appears to work for 'long double' operations, but not for 'float' and 'double' operations. I observe this across different OSes and across CPU vendors (AMD and Intel). I tried adding sync instructions ("cpuid" or "fwait"); that did not help. I also tried the gcc options -mfpmath=387 and -mfpmath=sse; that did not help either. It looks like the Intel 'fld' instruction, when loading a 32-bit or 64-bit SNaN into a register, converts it to an 80-bit QNaN without setting the Invalid Operation flag in the fstat register. Is there a way to get this working? Meanwhile, here's my understanding of the issue. Table of contents: I. Problems at instruction level II. Problems at C level III. Workarounds at C level IV. Consequences Problems at instruction level ============================= The i386 and x86_64 processors come with two floating-point units: the 387 unit and the SSE unit [1]. GCC for i386, by default, uses only the 387 unit. (In order to use the SSE unit, one needs to pass the compiler options '-mfpmath=sse -march=pentium4'). In order to get a floating-point value into the 387 unit, the compiler needs to emit an 'fld' instruction. [2] This instruction, when used to load a floating-point value from memory into the 387 unit, comes in three flavours: - flds, loads a 32-bit floating-point value (IEEE 754 single-precision) - fldl, loads a 64-bit floating-point value (IEEE 754 double-precision) - fldt, loads a 80-bit floating-point value (Intel "extended precision") In the cases 'flds' and 'fldl', the instruction converts a signalling NaN value to a quiet NaN value in the 80-bit format. [3] And it sets the "Invalid Operand" in the fstat (floating-point status) register [4]. I observe this directly with gdb: - for flds: The value 0xff800001 (an SNaN) in memory becomes in the FPU: R7 = Special 0xffffc000010000000000 QNaN, and fstat bit 0 gets set. - for fldl: The value 0xfff0000000100000 (an SNaN) in memory becomes in the FPU: R7 = Special 0xffffc000000080000000 QNaN, and fstat bit 0 gets set. - for fldt: The value 0xffff8000000100000000 (an SNaN) in memory becomes in the FPU: R7 = Special 0xffff8000000100000000 SNaN, and fstat is unchanged. This is not what IEEE 754 has specified. IEEE 754 has specified that floating-point *operations* (like addition, multiplication) should recognize a signalling NaN, set the "Invalid operation" exception flag, and — if no trap handler for this exception is installed — produce a quiet NaN as result of the operation. [5][6] Problems at C level =================== 1) Since the ABI often specifies that a function's return value, if of type 'float', 'double', or 'long double', is returned in a floating-point register, functions that return 'float' or 'double' can never return SNaNs. They always return QNaNs. This includes, in particular, the functions 'construct_SNaNf' and 'construct_SNaNd' in snan.h ! 2) Where the programmer had foreseen an "Invalid operation" exception to occur in one place (namely a floating-point operations), it actually occurs before that point, namely at the place where the operand it loaded into the FPU — which is at the compiler's discretion. Thus the resulting behaviour is highly dependent on the compiler optimization level. The use of fetestexcept (FE_INVALID) is therefore unreliable on i386 and x86_64, when it comes to SNaNs. Workarounds at C level ====================== Passing around a value of type union { double value; uint64_t word; } does preserve SNaNs, because - this type does not force a return value into an FPU register, - when copying such a value between memory locations, the compiler will not use an 'fld' instruction. Find attached two programs totalorder-snan-via-double.c and totalorder-snan-via-union.c, that implement a unit test for the totalorder() function. Once with 'double' as return value, the other one with a union type as return value. Here are the results (with gcc 11 and clang 16): x86_64: $ gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -lm && ./a.out $ gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -O2 -lm && ./a.out $ gcc -Wall totalorder-snan-via-union.c -lm && ./a.out $ gcc -Wall totalorder-snan-via-union.c -O2 -lm && ./a.out $ gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -lm && ./a.out Failed: i=0 j=1 Failed: i=1 j=0 Failed: i=12 j=13 Failed: i=13 j=12 $ gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -O2 -lm && ./a.out $ gcc -Wall totalorder-snan-via-double.c -lm && ./a.out $ gcc -Wall totalorder-snan-via-double.c -O2 -lm && ./a.out $ clang -Wall totalorder-snan-via-union.c -lm && ./a.out $ clang -Wall totalorder-snan-via-union.c -O2 -lm && ./a.out $ clang -Wall totalorder-snan-via-double.c -lm && ./a.out $ clang -Wall totalorder-snan-via-double.c -O2 -lm && ./a.out i386: $ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -lm && ./a.out $ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=387 -O2 -lm && ./a.out $ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=sse -march=pentium4 -lm && ./a.out $ i686-linux-gnu-gcc -Wall totalorder-snan-via-union.c -mfpmath=sse -march=pentium4 -O2 -lm && ./a.out $ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -lm && ./a.out Failed: i=0 j=1 Failed: i=1 j=0 Failed: i=12 j=13 Failed: i=13 j=12 $ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=387 -O2 -lm && ./a.out $ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=sse -march=pentium4 -lm && ./a.out Failed: i=0 j=1 Failed: i=1 j=0 Failed: i=12 j=13 Failed: i=13 j=12 $ i686-linux-gnu-gcc -Wall totalorder-snan-via-double.c -mfpmath=sse -march=pentium4 -O2 -lm && ./a.out $ clang -m32 -Wall totalorder-snan-via-union.c -lm && ./a.out $ clang -m32 -Wall totalorder-snan-via-union.c -O2 -lm && ./a.out $ clang -m32 -Wall totalorder-snan-via-union.c -mfpmath=sse -march=pentium4 -lm && ./a.out $ clang -m32 -Wall totalorder-snan-via-union.c -mfpmath=sse -march=pentium4 -O2 -lm && ./a.out $ clang -m32 -Wall totalorder-snan-via-double.c -lm && ./a.out Failed: i=0 j=1 Failed: i=1 j=0 Failed: i=12 j=13 Failed: i=13 j=12 $ clang -m32 -Wall totalorder-snan-via-double.c -O2 -lm && ./a.out $ clang -m32 -Wall totalorder-snan-via-double.c -mfpmath=sse -march=pentium4 -lm && ./a.out Failed: i=0 j=1 Failed: i=1 j=0 Failed: i=12 j=13 Failed: i=13 j=12 $ clang -m32 -Wall totalorder-snan-via-double.c -mfpmath=sse -march=pentium4 -O2 -lm && ./a.out What you can see, is: - The totalorder-snan-via-union.c program always works fine, regardless of CPU type, compiler, optimization levels. - The totalorder-snan-via-double.c program malfunctions when the compiler uses 387 instructions (as opposed to SSE (MMX) instructions) _and_ the compiler optimizations disallowed inlining of the 'construct_signalling_nan' function. Consequences ============ The consequences are different, according to the type of program. * Programs that don't need to distinguish QNaN and SNaN: Such programs work on i386 and x86_64, like on other CPUs. No special precautions are needed. Examples: The isfinite() or *printf unit tests. * Programs that need to distinguish QNaN and SNaN, but don't do floating-point operations: In these cases it is sufficient to - either pass values by reference, not by value, - or apply the 'union' workaround. Example: The totalorder() unit tests. * Programs that need to distinguish QNaN and SNaN, and that do floating-point operations. In these cases the 'union' workaround is typically too clumsy. So, for this case, there is typically no good workaround. Example: Code that wants to rely on the FE_INVALID flag. [1] https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/x86-Options.html [2] "Intel 64 and IA-32 Architectures Software Developer's Manual" volume 1, section 8.3.3 Data Transfer Instructions [3] https://stackoverflow.com/questions/61024467/ [4] "Intel 64 and IA-32 Architectures Software Developer's Manual" volume 2, section 3.2 FLD [5] https://en.wikipedia.org/wiki/IEEE_754#Exception_handling [6] https://en.wikipedia.org/wiki/NaN#Signaling_NaN
#define _GNU_SOURCE 1 #include <math.h> #include <stdint.h> #include <stdio.h> volatile double zero = 0.0; volatile double one = 1.0; volatile double minus_zero = -0.0; double positive_quiet_nan () { volatile double n = zero / zero; return signbit (n) ? -n : n; } double negative_quiet_nan () { volatile double n = zero / zero; return signbit (n) ? n : -n; } /* Assume IEEE 754 double-precision <https://en.wikipedia.org/wiki/Double-precision_floating-point_format> */ typedef union { double value; uint64_t word; } memory_double; double construct_signalling_nan (double quiet_value) { memory_double m; m.value = quiet_value; /* Turn the quiet NaN into a signalling NaN. */ m.word ^= (uint64_t) 1 << 51; /* Set some arbitrary mantissa bit. */ m.word |= (uint64_t) 1 << 19; return m.value; } int main () { double x[] = { negative_quiet_nan (), construct_signalling_nan (negative_quiet_nan ()), - one / zero, -1e37, -1, -1e-5, minus_zero, zero, 1e-5, 1, 1e37, one / zero, construct_signalling_nan (positive_quiet_nan ()), positive_quiet_nan () }; int result = 0; for (int i = 0; i < 14; i++) for (int j = 0; j < 14; j++) if (!(!!totalorder (&x[i], &x[j]) == (i <= j))) { fprintf (stderr, "Failed: i=%d j=%d\n", i, j); result = 1; } return result; } /* * i686-linux-gnu-gcc-11 -ggdb -Wall totalorder-snan-via-double.c -lm -o totalorder-snan-via-double */
#define _GNU_SOURCE 1 #include <math.h> #include <stdint.h> #include <stdio.h> volatile double zero = 0.0; volatile double one = 1.0; volatile double minus_zero = -0.0; double positive_quiet_nan () { volatile double n = zero / zero; return signbit (n) ? -n : n; } double negative_quiet_nan () { volatile double n = zero / zero; return signbit (n) ? n : -n; } /* Assume IEEE 754 double-precision <https://en.wikipedia.org/wiki/Double-precision_floating-point_format> */ typedef union { double value; uint64_t word; } memory_double; memory_double construct_signalling_nan (double quiet_value) { memory_double m; m.value = quiet_value; /* Turn the quiet NaN into a signalling NaN. */ m.word ^= (uint64_t) 1 << 51; /* Set some arbitrary mantissa bit. */ m.word |= (uint64_t) 1 << 19; return m; } int main () { memory_double x[] = { { negative_quiet_nan () }, construct_signalling_nan (negative_quiet_nan ()), { - one / zero }, { -1e37 }, { -1 }, { -1e-5 }, { minus_zero }, { zero }, { 1e-5 }, { 1 }, { 1e37 }, { one / zero }, construct_signalling_nan (positive_quiet_nan ()), { positive_quiet_nan () } }; int result = 0; for (int i = 0; i < 14; i++) for (int j = 0; j < 14; j++) if (!(!!totalorder (&x[i].value, &x[j].value) == (i <= j))) { fprintf (stderr, "Failed: i=%d j=%d\n", i, j); result = 1; } return result; } /* * i686-linux-gnu-gcc-11 -ggdb -Wall totalorder-snan-via-union.c -lm -o totalorder-snan-via-union */