https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2021-08-07 Ever confirmed|0 |1 Summary|vectorizer missing simple |-Ofast does not vectorize |case |while -O3 does. Status|UNCONFIRMED |NEW --- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> --- So here is the interesting for the trunk, With -O3 we can vectorize the loop because we are using a SLP vectorizer but -Ofast we don't as we say the vectorization is too costly. The inner most loop for -O3: .L3: addq $1, %rax addpd %xmm1, %xmm2 addpd %xmm1, %xmm3 addpd %xmm1, %xmm4 cmpq %rax, %rdi jne .L3 The SLP vectorizer has done it since 11+. Here is the inner loop for -Ofast: .L3: addq $1, %rax addsd %xmm0, %xmm3 addsd %xmm0, %xmm6 addsd %xmm0, %xmm1 addsd %xmm0, %xmm5 addsd %xmm0, %xmm2 addsd %xmm0, %xmm4 cmpq %rax, %rdi jne .L3 as you can see we don't vectorize it.