https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64404

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, this also shows the (known :/) wrong-code issue we have with the same-dr
and mixed SLP / loop vectorization.  The vectorizer is fed

  <bb 3>:
  # d_16 = PHI <d_13(4), 0(2)>
  # ivtmp_15 = PHI <ivtmp_14(4), 1024(2)>
  _4 = X[d_16].l;
  _5 = X[d_16].h;
  _6 = _4 + _5;
  Y[d_16].l = _6;
  Y[d_16].h = _6;
  Z[d_16].l = _4;
  _11 = X[d_16].h;
  Z[d_16].h = _11;
  d_13 = d_16 + 1;
  ivtmp_14 = ivtmp_15 - 1;
  if (ivtmp_14 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

where the Y = X op is SLPed but the Z = X one fails to SLP because of the
CSEd load (thus SLP analysis figures that the store has gaps).  When
trying to loop vectorize the Z = X op (by unrolling the loop once)
we get at the DEF for _11 which unfortunately is already vectorized
with SLP and the vectorized stmt is the one with the load permutation
applied.

We really are not prepared to vectorize this loop!

In fact the second X.h load shouldn't be considered a grouped load, but
nothing "splits" it apart again.

Reply via email to