[Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 23 Aug 2024 05:46:59 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463


--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
As of r15-3128-gde1923f9f4d534 now

FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-not
vfmadd[123]*ph[ \\\\t]
FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-times
vfmaddcph[ \\\\t] 1
FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times vfmaddcph[
\\\\t] 1

fail which look similar to the aarch64 fails (I have no idea if the patch
helped for those).

For the first test it's fma0 which is no longer vectorized as

        vmovdqu16       (%rdx), %zmm0
        vmovdqu16       (%rsi), %zmm1
        vfmaddcph       (%rdi), %zmm1, %zmm0
        vmovdqu16       %zmm0, (%rdx)

but

        vmovdqu16       (%rsi), %zmm0
        vmovdqu16       (%rdi), %zmm2
        movl    $1431655765, %eax
        kmovd   %eax, %k1
        vpshufb .LC1(%rip), %zmm0, %zmm1
        vfmadd213ph     (%rdx), %zmm2, %zmm1
        vpshufb .LC2(%rip), %zmm0, %zmm0
        vpshufb .LC0(%rip), %zmm2, %zmm3
        vmovdqa64       %zmm0, %zmm2
        vfmadd132ph     %zmm3, %zmm1, %zmm2
        vfnmadd132ph    %zmm3, %zmm1, %zmm0
        vpblendmw       %zmm0, %zmm2, %zmm0{%k1}
        vmovdqu16       %zmm0, (%rdx)

where instead of

note:    Found COMPLEX_FMA pattern in SLP tree

we have

note:    Found VEC_ADDSUB pattern in SLP tree
note:    Target does not support VEC_ADDSUB for vector type vector(32) _Float16 

with the IL difference being (- is good, + is bad)

  _12 = REALPART_EXPR <*_3>;
  _11 = IMAGPART_EXPR <*_3>;
...
@@ -46,10 +46,10 @@
   _27 = _19 * _25;
   _28 = _20 * _25;
   _29 = _19 * _24;
-  _30 = _26 - _27;
-  _31 = _28 + _29;
-  _32 = _12 + _30;
-  _33 = _11 + _31;
+  _9 = _12 + _26;
+  _10 = _11 + _28;
+  _32 = _9 - _27;
+  _33 = _10 + _29;
   REALPART_EXPR <*_3> = _32;
   IMAGPART_EXPR <*_3> = _33;
   i_18 = i_21 + 1;

which is different association, enabled by deleting dead uses that confuse
reassoc.

[Bug tree-optimization/116463] [15 Regression] fast-math-complex-mls-{double,float}.c fail after r15-3087-gb07f8a301158e5

Reply via email to