https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120089
--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> --- At least on x86_64 before r15-7533-g589d79e6268b05 we failed to vectorize this: t.c:17:12: note: examining phi: _33 = PHI <0(20), _42(17)> t.c:9:1: missed: not vectorized: relevant phi not supported: _33 = PHI <0(20), _42(17)> t.c:17:12: missed: bad operation or unsupported loop bound t.c:17:12: note: ***** Analysis failed with vector mode V2DI (this is the PHI that misses the SLP discovery) t.c:17:12: missed: can't vectorize early exit because the target doesn't support flag setting vector comparisons. t.c:17:12: note: unsupported SLP instance starting from: if (patt_37 != 0) t.c:17:12: missed: unsupported SLP instances t.c:17:12: note: ***** Analysis failed with vector mode V8QI (unsupported ptest) So the issue was previously latent. I did not yet spot the actual issue, the vectorization looks correct to my eyes... Note the main exit is the exit to the __builtin_trap(). With SSE4 we exit the main vector loop after 3 vector iterations via the early exit (to the non-trap) (gdb) p data $20 = {d = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 0 <repeats 76 times>}} movq %xmm3, %rdx movd %xmm1, %eax both IVs are 12 which I think is correct, but then the destination pointer seems mishandled - that somehow gets taken from the original scalar IV and thus it doesn't have VF == 4 imposed. Seems like a missed early-exit forced-live IV, but it's an address IV.