https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-08-07
     Ever confirmed|0                           |1
            Summary|vectorizer missing simple   |-Ofast does not vectorize
                   |case                        |while -O3 does.
             Status|UNCONFIRMED                 |NEW

--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So here is the interesting for the trunk,
With -O3 we can vectorize the loop because we are using a SLP vectorizer but
-Ofast we don't as we say the vectorization is too costly.

The inner most loop for -O3:
.L3:
        addq    $1, %rax
        addpd   %xmm1, %xmm2
        addpd   %xmm1, %xmm3
        addpd   %xmm1, %xmm4
        cmpq    %rax, %rdi
        jne     .L3

The SLP vectorizer has done it since 11+.

Here is the inner loop for -Ofast:
.L3:
        addq    $1, %rax
        addsd   %xmm0, %xmm3
        addsd   %xmm0, %xmm6
        addsd   %xmm0, %xmm1
        addsd   %xmm0, %xmm5
        addsd   %xmm0, %xmm2
        addsd   %xmm0, %xmm4
        cmpq    %rax, %rdi
        jne     .L3

as you can see we don't vectorize it.

Reply via email to