https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Sandiford from comment #5) > (In reply to Richard Biener from comment #3) > > We document > > > > class dr_with_seg_len > > { > > ... > > /* The minimum common alignment of DR's start address, SEG_LEN and > > ACCESS_SIZE. */ > > unsigned int align; > > > > but here we have access_size == 1 and align == 4. It's also said > > > > /* All addresses involved are known to have a common alignment ALIGN. > > We can therefore subtract ALIGN from an exclusive endpoint to get > > an inclusive endpoint. In the best (and common) case, ALIGN is the > > same as the access sizes of both DRs, and so subtracting ALIGN > > cancels out the addition of an access size. */ > > unsigned int align = MIN (dr_a.align, dr_b.align); > > poly_uint64 last_chunk_a = dr_a.access_size - align; > > poly_uint64 last_chunk_b = dr_b.access_size - align; > > > > and > > > > We also know > > that last_chunk_b <= |step|; this is checked elsewhere if it isn't > > guaranteed at compile time. > > > > step == 4, but last_chunk_a/b are -3U. I couldn't find the "elsewhere" > > to check what we validate there. > The assumption that access_size is a multiple of align is crucial, so like > you say, it all falls apart if that doesn't hold. In this case, that means > that last_chunk_* should never have been negative. > > But I agree that the “elsewhere” doesn't seem to exist after all. That is, > the step can be arbitrarily smaller than the access size. Somewhat > relatedly, we seem to vectorise: > > struct s { int x; } __attribute__((packed)); > > void f (char *xc, char *yc, int z) > { > for (int i = 0; i < 100; ++i) > { > struct s *x = (struct s *) xc; > struct s *y = (struct s *) yc; > x->x += y->x; > xc += z; > yc += z; > } > } > > on aarch64 even with -mstrict-align -fno-vect-cost-model, generating > elementwise accesses that assume that the ints are aligned. E.g.: > > _71 = (char *) ivtmp.19_21; > _30 = ivtmp.29_94 - _26; > _60 = (char *) _30; > _52 = __MEM <int> ((int *)_71); > _53 = (char *) ivtmp.25_18; > _54 = __MEM <int> ((int *)_53); > _55 = (char *) ivtmp.26_16; > _56 = __MEM <int> ((int *)_55); > _57 = (char *) ivtmp.27_88; > _58 = __MEM <int> ((int *)_57); > _59 = _Literal (int [[gnu::vector_size(16)]]) {_52, _54, _56, _58}; > > But the vector loop is executed even for a step of 1 (byte), provided that x > and y don't overlap. I think this is due a similar issue to what you noticed wrt dr_aligned and how we emit aligned loads then instead of checking the byte alignment vs. the access size we emit - I think we don't consider misaligned elements at all when code generating element accesses. We do see t.c:5:21: note: vect_compute_data_ref_alignment: t.c:5:21: missed: step doesn't divide the vector alignment. t.c:5:21: missed: Unknown alignment for access: MEM[(struct s *)xc_21].x t.c:5:21: note: vect_compute_data_ref_alignment: t.c:5:21: missed: step doesn't divide the vector alignment. t.c:5:21: missed: Unknown alignment for access: MEM[(struct s *)yc_22].x t.c:5:21: note: vect_compute_data_ref_alignment: t.c:5:21: missed: step doesn't divide the vector alignment. t.c:5:21: missed: Unknown alignment for access: MEM[(struct s *)xc_21].x I'm splitting this out to another PR.