On Wed, 15 Sep 2021, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Tue, 14 Sep 2021, Richard Sandiford wrote:
> >
> >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > This changes us to maintain and compute (mis-)alignment info for
> >> > the first element of a group only rather than for each DR when
> >> > doing interleaving and for the earliest, first, or first in the SLP
> >> > node (or any pair or all three of those) when SLP vectorizing.
> >> >
> >> > For this to work out the easiest way I have changed the accessors
> >> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
> >> > the first element rather than adjusting all callers.
> >> > dr_misalignment is moved out-of-line and I'm not too fond of the
> >> > poly-int dances there (any hints?), but basically we are now
> >> > adjusting the first elements misalignment based on the DR_INIT
> >> > difference.
> >> >
> >> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >> >
> >> > Richard.
> >> >
> >> > 2021-09-13  Richard Biener  <rguent...@suse.de>
> >> >
> >> >  * tree-vectorizer.h (dr_misalignment): Move out of line.
> >> >  (dr_target_alignment): New.
> >> >  (DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
> >> >  (set_dr_target_alignment): New.
> >> >  (SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
> >> >  * tree-vect-data-refs.c (dr_misalignment): Compute and
> >> >  return the group members misalignment.
> >> >  (vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
> >> >  (vect_analyze_data_refs_alignment): Compute alignment only
> >> >  for the first element of a DR group.
> >> >  (vect_slp_analyze_node_alignment): Likewise.
> >> > ---
> >> >  gcc/tree-vect-data-refs.c | 65 ++++++++++++++++++++++++---------------
> >> >  gcc/tree-vectorizer.h     | 24 ++++++++++-----
> >> >  2 files changed, 57 insertions(+), 32 deletions(-)
> >> >
> >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> >> > index 66e76132d14..b53d6a0b3f1 100644
> >> > --- a/gcc/tree-vect-data-refs.c
> >> > +++ b/gcc/tree-vect-data-refs.c
> >> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info 
> >> > *vinfo, slp_instance instance)
> >> >    return res;
> >> >  }
> >> >  
> >> > +/* Return the misalignment of DR_INFO.  */
> >> > +
> >> > +int
> >> > +dr_misalignment (dr_vec_info *dr_info)
> >> > +{
> >> > +  if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> >> > +    {
> >> > +      dr_vec_info *first_dr
> >> > +        = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> >> > +      int misalign = first_dr->misalignment;
> >> > +      gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> >> > +      if (misalign == DR_MISALIGNMENT_UNKNOWN)
> >> > +        return misalign;
> >> > +      poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
> >> > +                              - wi::to_poly_offset (DR_INIT 
> >> > (first_dr->dr)));
> >> > +      poly_int64 mispoly = misalign + diff.to_constant ().to_shwi ();
> >> > +      bool res = known_misalignment (mispoly,
> >> > +                                     
> >> > first_dr->target_alignment.to_constant (),
> >> > +                                     &misalign);
> >> > +      gcc_assert (res);
> >> > +      return misalign;
> >> 
> >> Yeah, not too keen on the to_constants here.  The one on diff looks
> >> redundant -- you could just use diff.force_shwi () instead, and
> >> keep everything poly_int.
> >>
> >> For the known_misalignment I think we should use:
> >> 
> >>    if (!can_div_trunc_p (mispoly, first_dr->target_alignment,
> >>                     &quotient, &misalign))
> >>      misalign = DR_MISALIGNMENT_UNKNOWN;
> >>    return misalign;
> >> 
> >> There are then no to_constant assumptions.
> >
> > OK, note that group analysis does
> >
> >           /* Check that the DR_INITs are compile-time constants.  */
> >           if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST
> >               || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
> >             break;
> >
> >           /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
> >           HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
> >           HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
> >
> > so I'm confident my variant was "correct", but it still was ugly.
> 
> Ah, OK.  In that case I don't mind the original version, but it would be
> good to have a comment above the to_constant saying where the condition
> is enforced.
> 
> I'm just trying to avoid to_constant calls with no comment to explain
> them, and with no nearby is_constant call.  Otherwise it could end up
> a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has
> been checked earlier (not always obvious where) and sometimes
> tree_to_uhwi is just used out of hope, to avoid having to think about
> the alternative.
> 
> > There's also the issue that target_alignment is poly_uint64 but
> > misalignment is signed int.
> >
> > Note that can_div_trunc_p seems to require a poly_uint64 remainder,
> > I'm not sure what to do with that, so I used is_constant.
> 
> Ah, yeah, forgot about that sorry.  I guess in that case, using
> is_constant on first_dr->target_alignment and sticking with
> known_misalignment would make sense.
> 
> > Btw, to what value do we want to align with variable sized vectors?
> > We are aligning globals according to target_alignment but only
> > support that for constant alignment.  Most users only want to
> > distinguish between aligned or not aligned and the actual misalignment
> > is only used to seed the SSA alignment info, so I suppose being
> > too conservative in dr_misalignment for variable size vectors isn't
> > too bad if we correctly identify fully aligned accesses?
> 
> In practice we don't require more than element alignment for VLA SVE
> as things stand.  However, there's support for using fully-predicated
> loops in which the first loop iteration aligns by using leading
> inactive lanes where possible (e.g. start with 1 inactive lane
> for a misalignment of 1).  In principle that would work for VLA too,
> we just haven't implemented it yet.
> 
> > I'm now testing a variant that looks like
> >
> > int
> > dr_misalignment (dr_vec_info *dr_info)
> > {
> >   if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> >     {
> >       dr_vec_info *first_dr
> >         = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> >       int misalign = first_dr->misalignment;
> >       gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> >       if (misalign == DR_MISALIGNMENT_UNKNOWN)
> >         return misalign;
> >       /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
> >          INTEGER_CSTs and the first element in the group has the lowest
> >          address.  */
> >       poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
> >                               - wi::to_poly_offset (DR_INIT 
> > (first_dr->dr)));
> >       gcc_assert (known_ge (diff, 0));
> >       poly_uint64 mispoly = misalign + diff.force_uhwi ();
> >       unsigned HOST_WIDE_INT quotient;
> >       if (can_div_trunc_p (mispoly, first_dr->target_alignment, &quotient,
> >                            &mispoly)
> >           && mispoly.is_constant ())
> >         return mispoly.to_constant ();
> >       return DR_MISALIGNMENT_UNKNOWN;
> >
> > I wonder why a non-poly works for the quotient here.  I suppose it's
> > because the runtime factor is the same for all poly-ints and thus
> > it cancels out?  But then the remainder likely will never be constant
> > unless it is zero?
> 
> It's not so much that the quotient is guaranteed to be constant,
> but that that particular overload is designed for the case in which
> the caller only cares about constant quotients, and wants can_div_trunc_p
> to fail otherwise.  In other words, it's really just a way of cutting down
> on is_constant calls.
> 
> poly-int.h doesn't provide every possible overload of can_div_trunc_p.
> In principle it could have a fully-general 4-poly_int version, but we
> haven't needed that yet.  It could also have a new overload for the
> case we care about here, avoiding the separate is_constant.

Hmm.  I guess I'll see to somehow unify this with the logic in
vect_compute_data_ref_alignment which at the moment forces
DR_MISALIGNMENT to unknown if DR_TARGET_ALIGNMENT of the vector type
isn't constant (so for SVE all accesses appear unaligned?).  In the
end vect_compute_data_ref_alignment does

  poly_int64 misalignment
    = base_misalignment + wi::to_poly_offset (drb->init).force_shwi ();

  /* If this is a backward running DR then first access in the larger
     vectype actually is N-1 elements before the address in the DR.
     Adjust misalign accordingly.  */
  if (tree_int_cst_sgn (drb->step) < 0)
    /* PLUS because STEP is negative.  */
    misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1)
                     * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE 
(vectype))));

  unsigned int const_misalignment;
  if (!known_misalignment (misalignment, vect_align_c, 
&const_misalignment))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "Non-constant misalignment for access: %T\n", 
ref);
      return;
    }

  SET_DR_MISALIGNMENT (dr_info, const_misalignment);

ignoring the negative step offset (where poly-int TYPE_VECTOR_SUBPARTS
sneaks in) this combines integer base_misaligned with DR_INIT
(known INTEGER_CST) and then with the known constant vect_align_c.

So the most simplest version is

      if (misalign == DR_MISALIGNMENT_UNKNOWN)
        return misalign;
      /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
         INTEGER_CSTs and the first element in the group has the lowest
         address.  Likewise vect_compute_data_ref_alignment will
         have ensured that target_alignment is constant and otherwise
         set misalign to DR_MISALIGNMENT_UNKNOWN.  */
      HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
                            - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
      gcc_assert (diff >= 0);
      unsigned HOST_WIDE_INT target_alignment_c 
        = first_dr->target_alignment.to_constant ();
      return (misalign + diff) % target_alignment_c;


Note my intent is to lift the restriction of having only one
vector type for vectorizing a DR group and alignment is where
currently correctness issues pop up so in the end
dr_misalignment () would get a vectype parameter (and the
odd negative STEP handling would move to the caller, eventually
with another 'offset' parameter to dr_misalignment () / aligned_access_p 
()).

So if you are fine with improving the situation for poly-ints
after all this then I'd go with the most simplest variant above.

OK?

Thanks,
Richard.

Reply via email to