On Mon, 28 Oct 2024, Alex Coplan wrote: > This allows us to vectorize more loops with early exits by forcing > peeling for alignment to make sure that we're guaranteed to be able to > safely read an entire vector iteration without crossing a page boundary. > > To make this work for VLA architectures we have to allow compile-time > non-constant target alignments. We also have to override the result of > the target's preferred_vector_alignment hook if it isn't a power-of-two > multiple of the TYPE_SIZE of the chosen vector type. > > There is currently an implicit assumption that the TYPE_SIZE of the > vector type is itself a power of two. For non-VLA types this > could be checked directly in the vectorizer. For VLA types I > had discussed offline with Richard S about adding a target hook to allow > the vectorizer to query the backend to confirm that a given VLA type > is known to have a power-of-two size at runtime.
GCC assumes all vectors have power-of-two size, so I don't think we need to check anything but we'd instead have to make sure the target constrains the hardware when this assumption doesn't hold in silicon. > I thought we > might be able to do this check in vector_alignment_reachable_p. Any > thoughts on that, richi? For the purpose of alignment peeling yeah, I guess this would be a possible place to check this. The hook is currently used for the case where the element has a lower alignment than its size and thus vector alignment cannot be reached by peeling. Btw, I thought we can already apply peeling for alignment for VLA vectors ... > gcc/ChangeLog: > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > Set need_peeling_for_alignment flag on read DRs instead of > failing vectorization. Punt on gathers. > (dr_misalignment): Handle non-constant target alignments. > (vect_compute_data_ref_alignment): If need_peeling_for_alignment > flag is set on the DR, then override the target alignment chosen > by the preferred_vector_alignment hook to choose a safe > alignment. > (vect_supportable_dr_alignment): Override > support_vector_misalignment hook if need_peeling_for_alignment > is set on the DR: in this case we must return > dr_unaligned_unsupported in order to force peeling. > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog > peeling by a compile-time non-constant amount. > * tree-vectorizer.h (dr_vec_info): Add new flag > need_peeling_for_alignment. > --- > gcc/tree-vect-data-refs.cc | 77 ++++++++++++++++++++++++++++++------- > gcc/tree-vect-loop-manip.cc | 6 --- > gcc/tree-vectorizer.h | 5 +++ > 3 files changed, 68 insertions(+), 20 deletions(-) Eh, where's the inline copy ... @@ -739,15 +739,22 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) if (DR_IS_READ (dr_ref) && !ref_within_array_bound (stmt, DR_REF (dr_ref))) { + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) + { + const char *msg you want to add STMT_VINFO_STRIDED_P as well. /* Vector size in bytes. */ + poly_uint64 safe_align + = exact_div (tree_to_poly_uint64 (TYPE_SIZE (vectype)), BITS_PER_UNIT); safe_align = TYPE_SIZE_UNIT (vectype); + /* Multiply by the unroll factor to get the number of bytes read + per vector iteration. */ + if (loop_vinfo) + { + auto num_copies = vect_get_num_copies (loop_vinfo, vectype); + gcc_checking_assert (pow2p_hwi (num_copies)); + safe_align *= num_copies; the unroll factor is the vectorization factor - I think the above goes wrong for grouped accesses like an early break condition if (a[2*i] == a[2*i+1]) or so. Thus, multiply by LOOP_VINFO_VECT_FACTOR (loop_vinfo). Note this number doesn't need to be a power of two (and num_copies above neither) The rest of the patch looks good to me. Richard.