On Fri, 15 Nov 2024, Alex Coplan wrote: > Hi, > > This is a v2 which hopefully addresses the feedback for v1 of the 1/5 > patch, originally posted here: > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666648.html > > As mentioned on IRC, it will need follow-up work to fix up latent > profile issues, but that can be done during stage 3. We will also need > a simple (hopefully obvious, even) follow-up patch to fix expectations > for various tests (since we now vectorize loops which we previously > couldn't). > > OK for trunk?
I'm still looking at + if (dr_info->need_peeling_for_alignment) + { + /* Vector size in bytes. */ + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype)); + + /* We can only peel for loops, of course. */ + gcc_checking_assert (loop_vinfo); + + /* Calculate the number of vectors read per vector iteration. If + it is a power of two, multiply through to get the required + alignment in bytes. Otherwise, fail analysis since alignment + peeling wouldn't work in such a case. */ + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + vf *= DR_GROUP_SIZE (stmt_info); so this is the total number of scalars we load, so maybe call it that way, num_scalars. + + auto num_vectors = vect_get_num_vectors (vf, vectype); + if (!pow2p_hwi (num_vectors)) side-note - with all these multiplies there's the possibility that we get a testcase that has safe_align > PAGE_SIZE, meaning it's no longer a good way to avoid trapping. This problem of course exists generally, we avoid it elsewhere by not having very large vectors or limiting the group-size. The actual problem is that we don't know the actual page size, but we maybe could configure a minimum as part of the platform configuration. Should we for now simply add || safe_align > 4096 here? A testcase would load 512 contiguous uint64 to form an early exit condition, quite unlikely I guess. With DR_GROUP_SIZE != 1 there's also the question whether we can ever reach the desired alignment since each peel will skip DR_GROUP_SIZE scalar elements - either those are already DR_GROUP_SIZE aligned or the result will never be. @@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, dr_vec_info *dr_info, if (misalignment == DR_MISALIGNMENT_UNKNOWN) is_packed = not_size_aligned (DR_REF (dr)); if (targetm.vectorize.support_vector_misalignment (mode, type, misalignment, - is_packed)) + is_packed) + && !dr_info->need_peeling_for_alignment) return dr_unaligned_supported; I think you need to do this earlier, like with if (misalignment == 0) return dr_aligned; + else if (dr_info->need_peeling_for_alignment) + return dr_unaligned_unsupported; diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index c8dc7153298..be2c2a1bc75 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, int estimated_vf; int prolog_peeling = 0; bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL; - /* We currently do not support prolog peeling if the target alignment is not - known at compile time. 'vect_gen_prolog_loop_niters' depends on the - target alignment being constant. */ - dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo); - if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ()) - return NULL; this indeed looks like an odd (wrong) place to enforce this, so I suppose you figured the check is not needed, independent of the patch? OK with the above changes. Thanks, Richard. > Thanks, > Alex > > -- >8 -- > > This allows us to vectorize more loops with early exits by forcing > peeling for alignment to make sure that we're guaranteed to be able to > safely read an entire vector iteration without crossing a page boundary. > > To make this work for VLA architectures we have to allow compile-time > non-constant target alignments. We also have to override the result of > the target's preferred_vector_alignment hook if it isn't already a > power-of-two multiple of the amount read per vector iteration. > > gcc/ChangeLog: > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > Set need_peeling_for_alignment flag on read DRs instead of > failing vectorization. Punt on gathers and strided_p accesses. > (dr_misalignment): Handle non-constant target alignments. > (vect_compute_data_ref_alignment): If need_peeling_for_alignment > flag is set on the DR, then override the target alignment chosen > by the preferred_vector_alignment hook to choose a safe > alignment. Add new result parameter. Use it ... > (vect_analyze_data_refs_alignment): ... here, and fail > analysis if vect_compute_data_ref_alignment sets it to a > failure. > (vect_supportable_dr_alignment): Override > support_vector_misalignment hook if need_peeling_for_alignment > is set on the DR: in this case we must return > dr_unaligned_unsupported in order to force peeling. > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog > peeling by a compile-time non-constant amount. > * tree-vectorizer.h (dr_vec_info): Add new flag > need_peeling_for_alignment. > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)