https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119586

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
So, even the 14 branch figures the access is aligned:

t.c:7:23: note:   === vect_analyze_data_refs_alignment ===
t.c:7:23: note:   recording new base alignment for &e
  alignment:    16
  misalignment: 0
  based on:     e[b_17][c_18][0] = 1;
t.c:7:23: note:   vect_compute_data_ref_alignment:
t.c:7:23: missed:   misalign = 0 bytes of ref e[b_17][c_18][0]
t.c:7:23: note:   vect_compute_data_ref_alignment:
t.c:7:23: missed:   misalign = 0 bytes of ref e[b_17][c_18][4]
...
t.c:7:23: note:   ==> examining statement: e[b_17][c_18][0] = 1;
t.c:7:23: note:   vect_is_simple_use: operand 1, type of def: constant
t.c:7:23: note:   vect_model_store_cost: aligned.
t.c:7:23: note:   vect_model_store_cost: inside_cost = 24, prologue_cost = 0 .
1 2 times vector_store costs 24 in body

but the VMAT_STRIDED_SLP case always used element aligned accesses:

          ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type));

while I was optimistic (well, I wanted to catch bugs...) with r15-8047 and
now honor dr_aligned:

          unsigned align;
          if (alignment_support_scheme == dr_aligned)
            align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info));
          else
            align = dr_alignment (vect_dr_behavior (vinfo, first_dr_info));
          /* Alignment is at most the access size if we do multiple stores.  */
          if (nstores > 1)
            align = MIN (tree_to_uhwi (TYPE_SIZE_UNIT (ltype)), align);
          ltype = build_aligned_type (ltype, align * BITS_PER_UNIT);

Ah, so what we do is upon alignment analysis compute

  /* Similarly we can only use base and misalignment information relative to
     an innermost loop if the misalignment stays the same throughout the
     execution of the loop.  As above, this is the case if the stride of
     the dataref evenly divides by the alignment.  */
  else
    {
      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
      step_preserves_misalignment_p
        = multiple_p (DR_STEP_ALIGNMENT (dr_info->dr) * vf, vect_align_c);

      if (!step_preserves_misalignment_p && dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "step doesn't divide the vector alignment.\n");

so it's correct that e[b_17][c_18][0] is aligned, but the 2nd copy
(we use VF == 2) e[b_17][c_18+1][0] is not necessarily.  Note we are
technically not "strided" (as in, variable step).

But that also means the upfront compute for VMAT_STRIDED_SLP is still wrong
for targets not supporting unaligned accesses (it has been wrong before
of course).

I'm going to revert the optimistic treatment of dr_aligned for now.

Reply via email to