https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96109
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > OK, it's indeed wrong which means we'd fall through the checks that prevent > SPARC from vectorizing here but then we'll create an unaligned access > anyway (because VMAT_STRIDED_SLP is too lazy to figure out appropriate > alignment). We're also assuming element alignment there. > > static double x[1024], y[1024]; > > void __attribute__((noipa)) > foo () > { > for (int i = 0; i < 511; ++i) > { > x[2*i] = y[1022 - 2*i - 1]; > x[2*i+1] = y[1022 - 2*i]; > } > } > > int main() > { > for (int i = 0; i < 1024; ++i) > x[i] = 0, y[i] = i; > foo (); > for (int i = 0; i < 1022; ++i) > if (x[i] != y[1022 - (i^1)]) > __builtin_abort (); > if (x[1022] != 0 || x[1023] != 0) > __builtin_abort (); > return 0; > } So for example on aarch64-linux with -O3 -mstrict-align -fno-vect-cost-model we analyze this as x.c:6:21: note: vect_model_load_cost: aligned. x.c:6:21: note: vect_model_load_cost: inside_cost = 3, prologue_cost = 0 . but correctly emit an unaligned load (dump with -gimple to see the alignment) _20 = &y + 8168ul; ... _19 = __PHI (__BB5: _16, __BB2: _20); ... _15 = __MEM <vector(2) double, 64> ((double *)_19); and correctly (but not optimal) expand it via extract-bit-field to ;; _15 = MEM <vector(2) double> [(double *)ivtmp_19]; (insn 12 11 13 (clobber (reg:V2DF 95 [ _15 ])) "x.c":8:17 -1 (nil)) (insn 13 12 14 (set (subreg:DI (reg:V2DF 95 [ _15 ]) 0) (mem:DI (reg:DI 92 [ ivtmp.13 ]) [1 MEM <vector(2) double> [(double *)ivtmp_19]+0 S8 A64])) "x.c":8:17 -1 (nil)) (insn 14 13 0 (set (subreg:DI (reg:V2DF 95 [ _15 ]) 8) (mem:DI (plus:DI (reg:DI 92 [ ivtmp.13 ]) (const_int 8 [0x8])) [1 MEM <vector(2) double> [(double *)ivtmp_19]+8 S8 A64])) "x.c":8:17 -1 (nil)) so we're not getting a runtime fail here but clearly the vectorizers idea of alignment of the access is bogus (and if VMAT_STRIDED_SLP were less lazy and trusted the computed alignment info we'd miscompile). I guess we're getting away with this because RTL expansion has fallback code to correctly expand misaligned accesses on strict-align targets. But clearly it's not what the vectorizer costs (it also oddly costs a vec_construct when you enable the cost model but doesn't emit any in the end).