https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86999
Bug ID: 86999 Summary: Incorrect code generation and missing optimization with -fno-signed-zeros. Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: asd0025 at gmail dot com Target Milestone: --- Consider the following trivial example (https://godbolt.org/g/5ms6Bf): #include <limits.h> typedef float v4f __attribute__((vector_size(16))); typedef int v4i __attribute__((vector_size(16))); v4f foo(v4f n, v4f p) { return n * p + p; } template <int N> v4f __neg1(v4f a) { v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ? INT_MIN : 0), ((N & 8) ? INT_MIN : 0)}; return __builtin_ia32_xorps(a, (v4f)v); } template <int N> v4f __neg2(v4f a) { v4i v = {((N & 1) ? INT_MIN : 0), ((N & 2) ? INT_MIN : 0), ((N & 4) ? INT_MIN : 0), ((N & 8) ? INT_MIN : 0)}; return (v4f)((v4i)a ^ v); } v4f neg1C(v4f a) { return __neg1<0x0C>(a); } v4f neg2C(v4f a) { return __neg2<0x0C>(a); } On GCC 7.x/8.x with -fno-signed-zeros (or implied by other flags eg.: -Ofast) foo() is not optimal on FMA capable hardware: foo(float __vector(4), float __vector(4)): vmulps xmm0, xmm0, xmm1 vaddps xmm0, xmm0, xmm1 ret With -fsigned-zeros: foo(float __vector(4), float __vector(4)): vfmadd132ps xmm0, xmm1, xmm1 ret Incorrect code is generated only on GCC 8.x with -fno-signed-zeros: neg1C(float __vector(4)): ret With -fsigned-zeros or with GCC 7.x: neg1C(float __vector(4)): vxorps xmm0, xmm0, XMMWORD PTR .LC1[rip] ret .LC1: .long 0 .long 0 .long 2147483648 .long 2147483648 Note however when using bitwise xor instead of __builtin_ia32_xorps() the generated code is correct in all cases: neg2C(float __vector(4)): vxorps xmm0, xmm0, XMMWORD PTR .LC1[rip] ret .LC1: .long 0 .long 0 .long 2147483648 .long 2147483648