https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115192
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #7) > I'm looking into the first issue. Interesting fact: > > > /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec > > -fno-tree-slp-vectorize --param vect-epilogues-nomask=0 > t.C:7:21: optimized: loop vectorized using 16 byte vectors > t.C:7:21: optimized: loop versioned for vectorization because of possible > aliasing > rguenther@localhost:/tmp> ./a.out > > /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec > > -fno-tree-slp-vectorize --param vect-epilogues-nomask=1 > t.C:7:21: optimized: loop vectorized using 16 byte vectors > t.C:7:21: optimized: loop versioned for vectorization because of possible > aliasing > t.C:7:21: optimized: loop vectorized using 8 byte vectors > rguenther@localhost:/tmp> ./a.out > Aborted (core dumped) > > so avoiding the vectorized epilog fixes this (I've also placed #pragma GCC > novector on the loop in main and noipa on foo). Actually with -fno-vect-cost-model even --param vect-epilogues-nomask=0 fails. Since we are vectorizing for (int y = 1; y < n; y++) { a[y * n][0] = d[y * n] + a[(y - 1) * n][0]; a[y * n][1] = d[y * n] + a[(y - 1) * n][1]; } with a VF of two this is a failure to identify the dependence between the iterations, so possibly related to r11-6380 as well. (compute_affine_dependence ref_a: BIT_FIELD_REF <*_37, 32, 0>, stmt_a: _38 = BIT_FIELD_REF <*_37, 32, 0>; ref_b: BIT_FIELD_REF <*_40, 32, 0>, stmt_b: BIT_FIELD_REF <*_40, 32, 0> = _41; ) -> dependence analysis failed Creating dr for BIT_FIELD_REF <*_37, 32, 0> analyze_innermost: success. base_address: a_23(D) offset from base address: 0 constant offset from base address: 0 step: (ssizetype) ((long unsigned int) n_20(D) * 16) base alignment: 16 base misalignment: 0 offset alignment: 128 step alignment: 16 base_object: BIT_FIELD_REF <*_37, 32, 0> Creating dr for BIT_FIELD_REF <*_40, 32, 0> analyze_innermost: success. base_address: (float4_t *) a_23(D) + (sizetype) n_20(D) * 16 offset from base address: 0 constant offset from base address: 0 step: (ssizetype) ((long unsigned int) n_20(D) * 16) base alignment: 16 base misalignment: 0 offset alignment: 128 step alignment: 16 base_object: BIT_FIELD_REF <*_40, 32, 0> and for reference Creating dr for BIT_FIELD_REF <*_37, 32, 32> analyze_innermost: success. base_address: a_23(D) offset from base address: 0 constant offset from base address: 4 step: (ssizetype) ((long unsigned int) n_20(D) * 16) base alignment: 16 base misalignment: 0 offset alignment: 128 step alignment: 16 base_object: BIT_FIELD_REF <*_37, 32, 32> that looks sensible. And 'a' is indeed properly aligned. t.c:6:21: note: recording new base alignment for d_22(D) + (sizetype) n_20(D) * 4 alignment: 4 misalignment: 0 based on: _32 = *_31; t.c:6:21: note: recording new base alignment for a_23(D) alignment: 16 misalignment: 0 based on: _38 = BIT_FIELD_REF <*_37, 32, 0>; t.c:6:21: note: recording new base alignment for (float4_t *) a_23(D) + (sizetype) n_20(D) * 16 alignment: 16 misalignment: 0 based on: BIT_FIELD_REF <*_40, 32, 0> = _41; t.c:6:21: note: vect_compute_data_ref_alignment: t.c:6:21: missed: step doesn't divide the vector alignment. t.c:6:21: missed: Unknown alignment for access: *_31 t.c:6:21: note: vect_compute_data_ref_alignment: t.c:6:21: missed: Unknown alignment for access: BIT_FIELD_REF <*_37, 32, 0> t.c:6:21: note: vect_compute_data_ref_alignment: t.c:6:21: missed: Unknown alignment for access: BIT_FIELD_REF <*_40, 32, 0>