https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115192
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- I'm looking into the first issue. Interesting fact: > /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec > -fno-tree-slp-vectorize --param vect-epilogues-nomask=0 t.C:7:21: optimized: loop vectorized using 16 byte vectors t.C:7:21: optimized: loop versioned for vectorization because of possible aliasing rguenther@localhost:/tmp> ./a.out > /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec > -fno-tree-slp-vectorize --param vect-epilogues-nomask=1 t.C:7:21: optimized: loop vectorized using 16 byte vectors t.C:7:21: optimized: loop versioned for vectorization because of possible aliasing t.C:7:21: optimized: loop vectorized using 8 byte vectors rguenther@localhost:/tmp> ./a.out Aborted (core dumped) so avoiding the vectorized epilog fixes this (I've also placed #pragma GCC novector on the loop in main and noipa on foo). C testcase: typedef float float4_t __attribute__((vector_size(4 * sizeof(float)))); void __attribute__((noipa)) foo(int n, const float *d, float4_t * __restrict a) { for (int y = 1; y < n; y++) for (int c = 0; c < 2; c++) a[y * n][c] = d[y * n] + a[(y - 1) * n][c]; } int main() { const int n = 3; float d[n*n]; float4_t a[n*n]; #pragma GCC novector for (int i = 0; i < n * n; ++i) d[i] = i; foo(n, d, a); if (a[6][1] != 9) __builtin_abort(); }