[Bug c++/100839] New: -O2 does dangerous optimizations using FMA (please don't break my cross product)

metalcaedes at gmail dot com via Gcc-bugs Sun, 30 May 2021 18:30:36 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839


            Bug ID: 100839
           Summary: -O2 does dangerous optimizations using FMA (please
                    don't break my cross product)
           Product: gcc
           Version: 11.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: metalcaedes at gmail dot com
  Target Milestone: ---

Created attachment 50893
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50893&action=edit
simple test case (same as on compiler explorer)

I'm having problems with a cross-product function returning (slightly) wrong
results when built with -O2 (or above) and with FMA enabled (like
-march=znver1) - more specifically, -fexpensive-optimizations causes the
problem.
-ffast-math or related flags were *not* used.
Values that should be exactly 0 aren't, because the intermediate of the
multiplication part of vfmsub132ss has "infinite precision" while the
subtracted value (also result of a multiplication) has single-precision.

I (or rather Doom3) has a 3D Vector class called idVec3 which has three
(single-precision) float members x, y, z.

The cross product is calculated like this:

  idVec3 cross( const idVec3& v1, const idVec3 &v2 ) {
        float x = v1.y * v2.z - v1.z * v2.y;
        float y = v1.z * v2.x - v1.x * v2.z;
        float z = v1.x * v2.y - v1.y * v2.x;
        return idVec3(x, y, z);
  }

For brevity (and because it was what caused the bug I investigated[1]) I only
looked at the calculation of z, but the same problem should happen with x and
y:
  float crossZ(const idVec3& v1, const idVec3& v2) {
      float z = v1.x * v2.y - v1.y * v2.x;
      return z;
  }
So if v1.x * v2.y == v1.y * v2.x (like when v1 == v2) z should be exactly 0.0
However, it's not when FMA and -fexpensive-optimizations are used, because then
that function is generated as:
  vmulss xmm1,xmm1,xmm2
  vfmsub132ss xmm0,xmm1,xmm3
  ret

So the `v1.y * v2.x` part is calculated with a normal multiplication and stored
as single-precision floats.
Then `v1.x * v2.y - other_result` is calculated with vfmsub132ss, which means
that the result of `v1.x * v2.y` is never stored, but only exist as an
"infinite precision intermediate result"[2] from which the other
(single-precision) result of the `vmulss` is subtracted.

This means that, if both multiplication results should *theoretically* be
identical, it returns the rounding error between the result as "infinite
precision" float (double at least?) and single-precision float.
This rounding error fits well into a float because floats have great precisions
near zero - and it can be relatively big: With v1.x = -277.3333129883 and v1.y
= -69.3333282471 (result of multiplications: about 23665.775), crossZ(v1, v1)
returns 0.0002170140.

With -O1 (or -O2 -fno-expensive-optimizations) the generated ASM is more
straightforward and, as expected, does two multiplications and then an
addition:

  vmovss xmm0,DWORD PTR [rdi]
  vmulss xmm0,xmm0,DWORD PTR [rsi+0x4]
  vmovss xmm1,DWORD PTR [rdi+0x4]
  vmulss xmm1,xmm1,DWORD PTR [rsi]
  vsubss xmm0,xmm0,xmm1
  ret

IMHO an optimization that basically causes a*b - b*a to not return exactly 0.0
should only be enabled with dangerous flags like -ffast-math, not with plain
-O2.
Incidentally, it seems like this is what clang does: It only uses vfmsub* in
crossZ() if -ffast-math is set.

Here's a compiler explorer link with a reduced (to .z) testcase, printing the
results of that function when compiled with -O2 vs -O1:
https://gcc.godbolt.org/z/8K3vKh7b3
The problem happens with all GCC versions I tested, including the 11.1 and
"trunk" versions in compiler explorer. I didn't test this, but I wouldn't be
surprised if plain C was also affected (and not just C++).

[1] https://github.com/RobertBeckebans/RBDOOM-3-BFG/issues/436
[2] https://www.felixcloutier.com/x86/vfmsub132ss:vfmsub213ss:vfmsub231ss

[Bug c++/100839] New: -O2 does dangerous optimizations using FMA (please don't break my cross product)

Reply via email to