https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116738

            Bug ID: 116738
           Summary: Constant folding of _mm_min_ss and _mm_max_ss is wrong
           Product: gcc
           Version: 14.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kobalicek.petr at gmail dot com
  Target Milestone: ---

GCC incorrectly optimizes x86 intrinsics, which have a defined operation at the
ISA level. It seems that the problem happens when a value is known at compile
time, hence constant folding uses a different operation compared to the CPU
when executed as an instruction.

Here is the definition of [V]MINSS:

```
MIN(SRC1, SRC2)
{
    IF ((SRC1 = 0.0) and (SRC2 = 0.0)) THEN DEST := SRC2;
        ELSE IF (SRC1 = NaN) THEN DEST := SRC2; FI;
        ELSE IF (SRC2 = NaN) THEN DEST := SRC2; FI;
        ELSE IF (SRC1 < SRC2) THEN DEST := SRC1;
        ELSE DEST := SRC2;
    FI;
}
```

So, it's clear that the SRC1 is only selected when an ordered comparison `SRC1
< SRC2` is true. However, GCC doesn't seem to respect this detail. Here is a
test case that I was able to craft:

```
    // Demonstration of a GCC bug in constant folding of SIMD intrinsics:
    #include <x86intrin.h>
    #include <numeric>
    #include <limits>
    #include <stdio.h>

    float clamp(float f) {
        __m128 v = _mm_set_ss(f);
        __m128 zero = _mm_setzero_ps();
        __m128 greatest = _mm_set_ss(std::numeric_limits<float>::max());

        v = _mm_min_ss(v, greatest);
        v = _mm_max_ss(v, zero);

        return _mm_cvtss_f32(v);
    }

    int main() {
        printf("clamp(-0) -> %f\n", clamp(-0.0f));
        printf("clamp(nan) -> %f\n",
clamp(std::numeric_limits<float>::quiet_NaN()));
        return 0;
    }
```

GCC results (wrong):

    clamp(-0) -> -0.000000
    clamp(nan) -> nan

Clang results (expected):

    clamp(-0) -> 0.000000
    clamp(nan) -> 340282346638528859811704183484516925440.000000

Here is a compiler explorer link:

    - https://godbolt.org/z/6afjoaj86

I'm aware this is a possible duplicate of an [UNCONFIRMED] bug:

    - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99497

However, fast-math is mentioned in that bug report, and I'm not interested in
fast-math at all, I'm not using that option.

This bug makes it impossible to create test cases for the implementation of
some optimized functions that I use as tests with GCC fail, but other compilers
produce correct results.

A possible workaround is to use _ps instead of _ss variant of the intrinsics,
but that's also something I would like to avoid as in some cases I really work
with a scalar value only.

Also interestingly, when compiled by GCC in debug mode (without optimizations)
GCC behaves correctly, so this bug is related to the optimization pipeline as
well.

I'm not aware of any UB in this test case.

Reply via email to