https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95219
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2020-05-20 Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target Milestone|--- |11.0 Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- I think this one is a bit older though (IIRC it was disabled before due to a testsuite bug). Vectorization _is_ clearly profitable - we're now using SLP (possibly since that got induction support): Vector inside of loop cost: 24 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 48 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 0 vectorized to .L2: movdqa %xmm0, %xmm4 movdqa %xmm1, %xmm3 paddq %xmm2, %xmm0 addq $32, %rdi movups %xmm4, -32(%rdi) paddq %xmm2, %xmm1 movups %xmm3, -16(%rdi) cmpq %rdi, %rax jne .L2 there's a missed optimization in that we choose two (identical) IVs for the induction (late FRE is in "simple" mode and thus does not get rid of those as equivalent) and that we have odd IVs (the extra moves), possibly out-of-SSA cannot coalesce because of the constants: # vect_vec_iv_.7_1 = PHI <{ 0, 0 }(2), _19(3)> # vect_vec_iv_.8_18 = PHI <{ 0, 0 }(2), _17(3)> and tricks maybe do not apply because of vector types. I'll take this bug.