https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95219

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-05-20
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
   Target Milestone|---                         |11.0
             Status|UNCONFIRMED                 |ASSIGNED

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think this one is a bit older though (IIRC it was disabled before due to a
testsuite bug).  Vectorization _is_ clearly profitable - we're now using SLP
(possibly since that got induction support):

  Vector inside of loop cost: 24
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar iteration cost: 48
  Scalar outside cost: 0
  Vector outside cost: 0
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 0

vectorized to

.L2:
        movdqa  %xmm0, %xmm4
        movdqa  %xmm1, %xmm3
        paddq   %xmm2, %xmm0
        addq    $32, %rdi
        movups  %xmm4, -32(%rdi)
        paddq   %xmm2, %xmm1
        movups  %xmm3, -16(%rdi)
        cmpq    %rdi, %rax
        jne     .L2

there's a missed optimization in that we choose two (identical) IVs for
the induction (late FRE is in "simple" mode and thus does not get rid of those
as equivalent) and that we have odd IVs (the extra moves), possibly
out-of-SSA cannot coalesce because of the constants:

  # vect_vec_iv_.7_1 = PHI <{ 0, 0 }(2), _19(3)>
  # vect_vec_iv_.8_18 = PHI <{ 0, 0 }(2), _17(3)>

and tricks maybe do not apply because of vector types.  I'll take this bug.

Reply via email to