https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2024-04-18
             Blocks|                            |53947
          Component|fortran                     |tree-optimization
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
When you avoid -ffast-math you'll get

.L4:
        vmovupd (%rax), %ymm0
        addq    $32, %rax
        vmulpd  %ymm2, %ymm0, %ymm1
        vpermilpd       $5, %ymm0, %ymm0
        vaddsubpd       %ymm0, %ymm1, %ymm1
        vmovupd %ymm1, -32(%rax)
        cmpq    %rax, %rdx
        jne     .L4

this consumes the negation with the vaddsubpd but has a multiplication
by zero for the sake of preserving signed zeros.

The main issue in the way of optimal code is that we lower the operations
early, arriving at

  _6 = REALPART_EXPR <(*c_10(D))[_1]>;
  _5 = IMAGPART_EXPR <(*c_10(D))[_1]>;
  _14 = -_5;
  REALPART_EXPR <(*c_10(D))[_1]> = _14;
  IMAGPART_EXPR <(*c_10(D))[_1]> = _6;

and vectorizing that fails to use SLP which introduces an interleaving chain.
Negation isn't supported as "two-operators" op as there's no corresponding
"no negation" operation and we lack a way to insert a noop during SLP build.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to