https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64404
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Ok, this also shows the (known :/) wrong-code issue we have with the same-dr and mixed SLP / loop vectorization. The vectorizer is fed <bb 3>: # d_16 = PHI <d_13(4), 0(2)> # ivtmp_15 = PHI <ivtmp_14(4), 1024(2)> _4 = X[d_16].l; _5 = X[d_16].h; _6 = _4 + _5; Y[d_16].l = _6; Y[d_16].h = _6; Z[d_16].l = _4; _11 = X[d_16].h; Z[d_16].h = _11; d_13 = d_16 + 1; ivtmp_14 = ivtmp_15 - 1; if (ivtmp_14 != 0) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; where the Y = X op is SLPed but the Z = X one fails to SLP because of the CSEd load (thus SLP analysis figures that the store has gaps). When trying to loop vectorize the Z = X op (by unrolling the loop once) we get at the DEF for _11 which unfortunately is already vectorized with SLP and the vectorized stmt is the one with the load permutation applied. We really are not prepared to vectorize this loop! In fact the second X.h load shouldn't be considered a grouped load, but nothing "splits" it apart again.