https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119586
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- So, even the 14 branch figures the access is aligned: t.c:7:23: note: === vect_analyze_data_refs_alignment === t.c:7:23: note: recording new base alignment for &e alignment: 16 misalignment: 0 based on: e[b_17][c_18][0] = 1; t.c:7:23: note: vect_compute_data_ref_alignment: t.c:7:23: missed: misalign = 0 bytes of ref e[b_17][c_18][0] t.c:7:23: note: vect_compute_data_ref_alignment: t.c:7:23: missed: misalign = 0 bytes of ref e[b_17][c_18][4] ... t.c:7:23: note: ==> examining statement: e[b_17][c_18][0] = 1; t.c:7:23: note: vect_is_simple_use: operand 1, type of def: constant t.c:7:23: note: vect_model_store_cost: aligned. t.c:7:23: note: vect_model_store_cost: inside_cost = 24, prologue_cost = 0 . 1 2 times vector_store costs 24 in body but the VMAT_STRIDED_SLP case always used element aligned accesses: ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type)); while I was optimistic (well, I wanted to catch bugs...) with r15-8047 and now honor dr_aligned: unsigned align; if (alignment_support_scheme == dr_aligned) align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); else align = dr_alignment (vect_dr_behavior (vinfo, first_dr_info)); /* Alignment is at most the access size if we do multiple stores. */ if (nstores > 1) align = MIN (tree_to_uhwi (TYPE_SIZE_UNIT (ltype)), align); ltype = build_aligned_type (ltype, align * BITS_PER_UNIT); Ah, so what we do is upon alignment analysis compute /* Similarly we can only use base and misalignment information relative to an innermost loop if the misalignment stays the same throughout the execution of the loop. As above, this is the case if the stride of the dataref evenly divides by the alignment. */ else { poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); step_preserves_misalignment_p = multiple_p (DR_STEP_ALIGNMENT (dr_info->dr) * vf, vect_align_c); if (!step_preserves_misalignment_p && dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "step doesn't divide the vector alignment.\n"); so it's correct that e[b_17][c_18][0] is aligned, but the 2nd copy (we use VF == 2) e[b_17][c_18+1][0] is not necessarily. Note we are technically not "strided" (as in, variable step). But that also means the upfront compute for VMAT_STRIDED_SLP is still wrong for targets not supporting unaligned accesses (it has been wrong before of course). I'm going to revert the optimistic treatment of dr_aligned for now.