https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Last reconfirmed| |2024-04-18 Blocks| |53947 Component|fortran |tree-optimization Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- When you avoid -ffast-math you'll get .L4: vmovupd (%rax), %ymm0 addq $32, %rax vmulpd %ymm2, %ymm0, %ymm1 vpermilpd $5, %ymm0, %ymm0 vaddsubpd %ymm0, %ymm1, %ymm1 vmovupd %ymm1, -32(%rax) cmpq %rax, %rdx jne .L4 this consumes the negation with the vaddsubpd but has a multiplication by zero for the sake of preserving signed zeros. The main issue in the way of optimal code is that we lower the operations early, arriving at _6 = REALPART_EXPR <(*c_10(D))[_1]>; _5 = IMAGPART_EXPR <(*c_10(D))[_1]>; _14 = -_5; REALPART_EXPR <(*c_10(D))[_1]> = _14; IMAGPART_EXPR <(*c_10(D))[_1]> = _6; and vectorizing that fails to use SLP which introduces an interleaving chain. Negation isn't supported as "two-operators" op as there's no corresponding "no negation" operation and we lack a way to insert a noop during SLP build. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations