https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- But the issue seems to be t.c:3:22: note: ==> examining statement: _34 = *pix1_19; t.c:3:22: missed: permutation requires at least three vectors _34 = *pix1_19; t.c:3:22: missed: unsupported load permutation t.c:6:24: missed: not vectorized: relevant stmt not supported: _34 = *pix1_19; t.c:3:22: note: removing SLP instance operations starting from: *_44 = _45; t.c:3:22: missed: unsupported SLP instances t.c:3:22: note: re-trying with SLP disabled so SLP vectorization failing because of unsupported permutes with the larger vector size and the non-SLP case failing with t.c:3:22: missed: loop does not have enough iterations to support vectorization. t.c:3:22: note: ***** Analysis failed with vector mode V16QI so I don't see the connection with the pattern. Only for V8QI I see it remotely mentioned, but there we have _different_ pattens matched... I think the permute issue is "old" and goes away if you make it strided-slp by incrementing pix1/2 by a non-constant, then we can load the vector by char[4] pieces. We just don't consider that possibility when instead trying "strided" (with gap at the end). The widen patterns are a red herring here I think.