On Wed, 15 Sep 2021, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Tue, 14 Sep 2021, Richard Sandiford wrote: > > > >> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> > This changes us to maintain and compute (mis-)alignment info for > >> > the first element of a group only rather than for each DR when > >> > doing interleaving and for the earliest, first, or first in the SLP > >> > node (or any pair or all three of those) when SLP vectorizing. > >> > > >> > For this to work out the easiest way I have changed the accessors > >> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to > >> > the first element rather than adjusting all callers. > >> > dr_misalignment is moved out-of-line and I'm not too fond of the > >> > poly-int dances there (any hints?), but basically we are now > >> > adjusting the first elements misalignment based on the DR_INIT > >> > difference. > >> > > >> > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > >> > > >> > Richard. > >> > > >> > 2021-09-13 Richard Biener <rguent...@suse.de> > >> > > >> > * tree-vectorizer.h (dr_misalignment): Move out of line. > >> > (dr_target_alignment): New. > >> > (DR_TARGET_ALIGNMENT): Wrap dr_target_alignment. > >> > (set_dr_target_alignment): New. > >> > (SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment. > >> > * tree-vect-data-refs.c (dr_misalignment): Compute and > >> > return the group members misalignment. > >> > (vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT. > >> > (vect_analyze_data_refs_alignment): Compute alignment only > >> > for the first element of a DR group. > >> > (vect_slp_analyze_node_alignment): Likewise. > >> > --- > >> > gcc/tree-vect-data-refs.c | 65 ++++++++++++++++++++++++--------------- > >> > gcc/tree-vectorizer.h | 24 ++++++++++----- > >> > 2 files changed, 57 insertions(+), 32 deletions(-) > >> > > >> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > >> > index 66e76132d14..b53d6a0b3f1 100644 > >> > --- a/gcc/tree-vect-data-refs.c > >> > +++ b/gcc/tree-vect-data-refs.c > >> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info > >> > *vinfo, slp_instance instance) > >> > return res; > >> > } > >> > > >> > +/* Return the misalignment of DR_INFO. */ > >> > + > >> > +int > >> > +dr_misalignment (dr_vec_info *dr_info) > >> > +{ > >> > + if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)) > >> > + { > >> > + dr_vec_info *first_dr > >> > + = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt)); > >> > + int misalign = first_dr->misalignment; > >> > + gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED); > >> > + if (misalign == DR_MISALIGNMENT_UNKNOWN) > >> > + return misalign; > >> > + poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr)) > >> > + - wi::to_poly_offset (DR_INIT > >> > (first_dr->dr))); > >> > + poly_int64 mispoly = misalign + diff.to_constant ().to_shwi (); > >> > + bool res = known_misalignment (mispoly, > >> > + > >> > first_dr->target_alignment.to_constant (), > >> > + &misalign); > >> > + gcc_assert (res); > >> > + return misalign; > >> > >> Yeah, not too keen on the to_constants here. The one on diff looks > >> redundant -- you could just use diff.force_shwi () instead, and > >> keep everything poly_int. > >> > >> For the known_misalignment I think we should use: > >> > >> if (!can_div_trunc_p (mispoly, first_dr->target_alignment, > >> "ient, &misalign)) > >> misalign = DR_MISALIGNMENT_UNKNOWN; > >> return misalign; > >> > >> There are then no to_constant assumptions. > > > > OK, note that group analysis does > > > > /* Check that the DR_INITs are compile-time constants. */ > > if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST > > || TREE_CODE (DR_INIT (drb)) != INTEGER_CST) > > break; > > > > /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb). */ > > HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra)); > > HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb)); > > > > so I'm confident my variant was "correct", but it still was ugly. > > Ah, OK. In that case I don't mind the original version, but it would be > good to have a comment above the to_constant saying where the condition > is enforced. > > I'm just trying to avoid to_constant calls with no comment to explain > them, and with no nearby is_constant call. Otherwise it could end up > a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has > been checked earlier (not always obvious where) and sometimes > tree_to_uhwi is just used out of hope, to avoid having to think about > the alternative. > > > There's also the issue that target_alignment is poly_uint64 but > > misalignment is signed int. > > > > Note that can_div_trunc_p seems to require a poly_uint64 remainder, > > I'm not sure what to do with that, so I used is_constant. > > Ah, yeah, forgot about that sorry. I guess in that case, using > is_constant on first_dr->target_alignment and sticking with > known_misalignment would make sense. > > > Btw, to what value do we want to align with variable sized vectors? > > We are aligning globals according to target_alignment but only > > support that for constant alignment. Most users only want to > > distinguish between aligned or not aligned and the actual misalignment > > is only used to seed the SSA alignment info, so I suppose being > > too conservative in dr_misalignment for variable size vectors isn't > > too bad if we correctly identify fully aligned accesses? > > In practice we don't require more than element alignment for VLA SVE > as things stand. However, there's support for using fully-predicated > loops in which the first loop iteration aligns by using leading > inactive lanes where possible (e.g. start with 1 inactive lane > for a misalignment of 1). In principle that would work for VLA too, > we just haven't implemented it yet. > > > I'm now testing a variant that looks like > > > > int > > dr_misalignment (dr_vec_info *dr_info) > > { > > if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt)) > > { > > dr_vec_info *first_dr > > = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt)); > > int misalign = first_dr->misalignment; > > gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED); > > if (misalign == DR_MISALIGNMENT_UNKNOWN) > > return misalign; > > /* vect_analyze_data_ref_accesses guarantees that DR_INIT are > > INTEGER_CSTs and the first element in the group has the lowest > > address. */ > > poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr)) > > - wi::to_poly_offset (DR_INIT > > (first_dr->dr))); > > gcc_assert (known_ge (diff, 0)); > > poly_uint64 mispoly = misalign + diff.force_uhwi (); > > unsigned HOST_WIDE_INT quotient; > > if (can_div_trunc_p (mispoly, first_dr->target_alignment, "ient, > > &mispoly) > > && mispoly.is_constant ()) > > return mispoly.to_constant (); > > return DR_MISALIGNMENT_UNKNOWN; > > > > I wonder why a non-poly works for the quotient here. I suppose it's > > because the runtime factor is the same for all poly-ints and thus > > it cancels out? But then the remainder likely will never be constant > > unless it is zero? > > It's not so much that the quotient is guaranteed to be constant, > but that that particular overload is designed for the case in which > the caller only cares about constant quotients, and wants can_div_trunc_p > to fail otherwise. In other words, it's really just a way of cutting down > on is_constant calls. > > poly-int.h doesn't provide every possible overload of can_div_trunc_p. > In principle it could have a fully-general 4-poly_int version, but we > haven't needed that yet. It could also have a new overload for the > case we care about here, avoiding the separate is_constant.
Hmm. I guess I'll see to somehow unify this with the logic in vect_compute_data_ref_alignment which at the moment forces DR_MISALIGNMENT to unknown if DR_TARGET_ALIGNMENT of the vector type isn't constant (so for SVE all accesses appear unaligned?). In the end vect_compute_data_ref_alignment does poly_int64 misalignment = base_misalignment + wi::to_poly_offset (drb->init).force_shwi (); /* If this is a backward running DR then first access in the larger vectype actually is N-1 elements before the address in the DR. Adjust misalign accordingly. */ if (tree_int_cst_sgn (drb->step) < 0) /* PLUS because STEP is negative. */ misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1) * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype)))); unsigned int const_misalignment; if (!known_misalignment (misalignment, vect_align_c, &const_misalignment)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "Non-constant misalignment for access: %T\n", ref); return; } SET_DR_MISALIGNMENT (dr_info, const_misalignment); ignoring the negative step offset (where poly-int TYPE_VECTOR_SUBPARTS sneaks in) this combines integer base_misaligned with DR_INIT (known INTEGER_CST) and then with the known constant vect_align_c. So the most simplest version is if (misalign == DR_MISALIGNMENT_UNKNOWN) return misalign; /* vect_analyze_data_ref_accesses guarantees that DR_INIT are INTEGER_CSTs and the first element in the group has the lowest address. Likewise vect_compute_data_ref_alignment will have ensured that target_alignment is constant and otherwise set misalign to DR_MISALIGNMENT_UNKNOWN. */ HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr)) - TREE_INT_CST_LOW (DR_INIT (first_dr->dr))); gcc_assert (diff >= 0); unsigned HOST_WIDE_INT target_alignment_c = first_dr->target_alignment.to_constant (); return (misalign + diff) % target_alignment_c; Note my intent is to lift the restriction of having only one vector type for vectorizing a DR group and alignment is where currently correctness issues pop up so in the end dr_misalignment () would get a vectype parameter (and the odd negative STEP handling would move to the caller, eventually with another 'offset' parameter to dr_misalignment () / aligned_access_p ()). So if you are fine with improving the situation for poly-ints after all this then I'd go with the most simplest variant above. OK? Thanks, Richard.