[Bug middle-end/86999] New: Incorrect code generation and missing optimization with -fno-signed-zeros.

asd0025 at gmail dot com Fri, 17 Aug 2018 10:03:39 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86999


            Bug ID: 86999
           Summary: Incorrect code generation and missing optimization
                    with -fno-signed-zeros.
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: asd0025 at gmail dot com
  Target Milestone: ---

Consider the following trivial example (https://godbolt.org/g/5ms6Bf):

  #include <limits.h>

  typedef float v4f __attribute__((vector_size(16)));
  typedef int v4i __attribute__((vector_size(16)));

  v4f foo(v4f n, v4f p)
  {
     return n * p + p;
  }

  template <int N> v4f __neg1(v4f a)
  {
    v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ?
INT_MIN : 0), ((N & 8) ? INT_MIN : 0)};
    return __builtin_ia32_xorps(a, (v4f)v);
  }

  template <int N> v4f __neg2(v4f a)
  {
    v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ?
INT_MIN : 0), ((N & 8) ? INT_MIN : 0)};
    return (v4f)((v4i)a ^ v);
  }

  v4f neg1C(v4f a)
  {
    return __neg1<0x0C>(a);
  }

  v4f neg2C(v4f a)
  {
    return __neg2<0x0C>(a);
  }

On GCC 7.x/8.x with -fno-signed-zeros (or implied by other flags eg.: -Ofast)
foo() is not optimal on FMA capable hardware:

  foo(float __vector(4), float __vector(4)):
        vmulps  xmm0, xmm0, xmm1
        vaddps  xmm0, xmm0, xmm1
        ret

With -fsigned-zeros:

  foo(float __vector(4), float __vector(4)):
        vfmadd132ps     xmm0, xmm1, xmm1
        ret

Incorrect code is generated only on GCC 8.x with -fno-signed-zeros:

  neg1C(float __vector(4)):
        ret

With -fsigned-zeros or with GCC 7.x:

  neg1C(float __vector(4)):
        vxorps  xmm0, xmm0, XMMWORD PTR .LC1[rip]
        ret
  .LC1:
        .long   0
        .long   0
        .long   2147483648
        .long   2147483648

Note however when using bitwise xor instead of __builtin_ia32_xorps() the
generated code is correct in all cases:

  neg2C(float __vector(4)):
        vxorps  xmm0, xmm0, XMMWORD PTR .LC1[rip]
        ret
  .LC1:
        .long   0
        .long   0
        .long   2147483648
        .long   2147483648

[Bug middle-end/86999] New: Incorrect code generation and missing optimization with -fno-signed-zeros.

Reply via email to