On 21/11/2024 10:02, Richard Biener wrote: > On Fri, 15 Nov 2024, Alex Coplan wrote: > > > Hi, > > > > This is a v2 which hopefully addresses the feedback for v1 of the 1/5 > > patch, originally posted here: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666648.html > > > > As mentioned on IRC, it will need follow-up work to fix up latent > > profile issues, but that can be done during stage 3. We will also need > > a simple (hopefully obvious, even) follow-up patch to fix expectations > > for various tests (since we now vectorize loops which we previously > > couldn't). > > > > OK for trunk? > > I'm still looking at > > + if (dr_info->need_peeling_for_alignment) > + { > + /* Vector size in bytes. */ > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT > (vectype)); > + > + /* We can only peel for loops, of course. */ > + gcc_checking_assert (loop_vinfo); > + > + /* Calculate the number of vectors read per vector iteration. If > + it is a power of two, multiply through to get the required > + alignment in bytes. Otherwise, fail analysis since alignment > + peeling wouldn't work in such a case. */ > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > + vf *= DR_GROUP_SIZE (stmt_info); > > so this is the total number of scalars we load, so maybe call it > that way, num_scalars.
Will do. > > + > + auto num_vectors = vect_get_num_vectors (vf, vectype); > + if (!pow2p_hwi (num_vectors)) > > side-note - with all these multiplies there's the possibility that > we get a testcase that has safe_align > PAGE_SIZE, meaning it's > no longer a good way to avoid trapping. This problem of course > exists generally, we avoid it elsewhere by not having very large > vectors or limiting the group-size. The actual problem is that > we don't know the actual page size, but we maybe could configure > a minimum as part of the platform configuration. Should we for > now simply add > > || safe_align > 4096 > > here? A testcase would load 512 contiguous uint64 to form an early exit > condition, quite unlikely I guess. Good point. I suppose this really depends whether there are targets/platforms that GCC supports with a page size smaller than 4k. Perhaps a min_page_size target hook (defaulting to 4k) would be sensible. Then if there are any such targets they can override the hook. WDYT? > > With DR_GROUP_SIZE != 1 there's also the question whether we can > ever reach the desired alignment since each peel will skip > DR_GROUP_SIZE scalar elements - either those are already > DR_GROUP_SIZE aligned or the result will never be. I had a look at this, I think this should already be handled by vector_alignment_reachable_p which already includes a suitable check for STMT_VINFO_GROUPED_ACCESS. E.g. for the following testcase: double *a; int f(int n) { for (int i = 0; i < n; i += 2) if (a[i]) __builtin_abort(); } we have DR_GROUP_SIZE = 2 and in the dump (-O3, with the alignment peeling patches) we see: note: === vect_enhance_data_refs_alignment === missed: vector alignment may not be reachable note: vect_can_advance_ivs_p: note: Analyze phi: i_12 = PHI <i_9(8), 0(7)> note: Alignment of access forced using versioning. note: Versioning for alignment will be applied. so we decide to version instead, since peeling isn't viable here (vector_alignment_reachable_p correctly returns false). Perhaps that says the DR flag should really be called need_peeling_or_versioning_for_alignment (although better I guess just to update the comment above its definition to mention versioning). > > @@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, > dr_vec_info *dr_info, > if (misalignment == DR_MISALIGNMENT_UNKNOWN) > is_packed = not_size_aligned (DR_REF (dr)); > if (targetm.vectorize.support_vector_misalignment (mode, type, > misalignment, > - is_packed)) > + is_packed) > + && !dr_info->need_peeling_for_alignment) > return dr_unaligned_supported; > > I think you need to do this earlier, like with > > if (misalignment == 0) > return dr_aligned; > + else if (dr_info->need_peeling_for_alignment) > + return dr_unaligned_unsupported; That seems more obviously correct indeed. I'll make that change. > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index c8dc7153298..be2c2a1bc75 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > int estimated_vf; > int prolog_peeling = 0; > bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL; > - /* We currently do not support prolog peeling if the target alignment > is not > - known at compile time. 'vect_gen_prolog_loop_niters' depends on the > - target alignment being constant. */ > - dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo); > - if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ()) > - return NULL; > > this indeed looks like an odd (wrong) place to enforce this, so I > suppose you figured the check is not needed, independent of the patch? Yeah, exactly, it seems that it isn't needed. > > OK with the above changes. Thanks! I guess we just need to decide on the page size issue above before going ahead. Alex > > Thanks, > Richard. > > > > Thanks, > > Alex > > > > -- >8 -- > > > > This allows us to vectorize more loops with early exits by forcing > > peeling for alignment to make sure that we're guaranteed to be able to > > safely read an entire vector iteration without crossing a page boundary. > > > > To make this work for VLA architectures we have to allow compile-time > > non-constant target alignments. We also have to override the result of > > the target's preferred_vector_alignment hook if it isn't already a > > power-of-two multiple of the amount read per vector iteration. > > > > gcc/ChangeLog: > > > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > > Set need_peeling_for_alignment flag on read DRs instead of > > failing vectorization. Punt on gathers and strided_p accesses. > > (dr_misalignment): Handle non-constant target alignments. > > (vect_compute_data_ref_alignment): If need_peeling_for_alignment > > flag is set on the DR, then override the target alignment chosen > > by the preferred_vector_alignment hook to choose a safe > > alignment. Add new result parameter. Use it ... > > (vect_analyze_data_refs_alignment): ... here, and fail > > analysis if vect_compute_data_ref_alignment sets it to a > > failure. > > (vect_supportable_dr_alignment): Override > > support_vector_misalignment hook if need_peeling_for_alignment > > is set on the DR: in this case we must return > > dr_unaligned_unsupported in order to force peeling. > > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog > > peeling by a compile-time non-constant amount. > > * tree-vectorizer.h (dr_vec_info): Add new flag > > need_peeling_for_alignment. > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)