https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99394
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- This is a loop-carried data dependence which we can't handle (we avoid creating those from PRE but here it appears in the source itself). I wonder how LLVM handles this (pre/post vectorization IL). Specifically 'carry around variable' is something we don't handle. Can you somehow extract a compilable testcase (with just this kernel)? Looking at the source peeling a single iteration (to get rid of the initial value) and then undoing the PRE, vectorizing for (int i = 1; i < LEN_1D; i++) { a[i] = (b[i] + b[i-1]) * (real_t).5; } would likely result in optimal code. The assembly from clang doesn't look optimal to me - llvm likely materializes 'x' as temporary array, vectorizing x[0] = b[LEN_1D-1]; for (int i = 0; i < LEN_1D; i++) { a[i] = (b[i] + x[i]) * (real_t).5; x[i+1] = b[i]; } and then somehow (like we handle OMP simd lane arrays?) uses two vectors as a sliding window over x[]. At least the standard strathegy for these kind of dependences is to get "rid" of them by making them data dependences and then hope for the best.