On Mon, 28 Oct 2024, Alex Coplan wrote:

> This allows us to vectorize more loops with early exits by forcing
> peeling for alignment to make sure that we're guaranteed to be able to
> safely read an entire vector iteration without crossing a page boundary.
> 
> To make this work for VLA architectures we have to allow compile-time
> non-constant target alignments.  We also have to override the result of
> the target's preferred_vector_alignment hook if it isn't a power-of-two
> multiple of the TYPE_SIZE of the chosen vector type.
> 
> There is currently an implicit assumption that the TYPE_SIZE of the
> vector type is itself a power of two.  For non-VLA types this
> could be checked directly in the vectorizer.  For VLA types I
> had discussed offline with Richard S about adding a target hook to allow
> the vectorizer to query the backend to confirm that a given VLA type
> is known to have a power-of-two size at runtime.

GCC assumes all vectors have power-of-two size, so I don't think we
need to check anything but we'd instead have to make sure the
target constrains the hardware when this assumption doesn't hold
in silicon.

>  I thought we
> might be able to do this check in vector_alignment_reachable_p.  Any
> thoughts on that, richi?

For the purpose of alignment peeling yeah, I guess this would be
a possible place to check this.  The hook is currently used for
the case where the element has a lower alignment than its
size and thus vector alignment cannot be reached by peeling.

Btw, I thought we can already apply peeling for alignment for
VLA vectors ...

> gcc/ChangeLog:
> 
>       * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
>       Set need_peeling_for_alignment flag on read DRs instead of
>       failing vectorization.  Punt on gathers.
>       (dr_misalignment): Handle non-constant target alignments.
>       (vect_compute_data_ref_alignment): If need_peeling_for_alignment
>       flag is set on the DR, then override the target alignment chosen
>       by the preferred_vector_alignment hook to choose a safe
>       alignment.
>       (vect_supportable_dr_alignment): Override
>       support_vector_misalignment hook if need_peeling_for_alignment
>       is set on the DR: in this case we must return
>       dr_unaligned_unsupported in order to force peeling.
>       * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
>       peeling by a compile-time non-constant amount.
>       * tree-vectorizer.h (dr_vec_info): Add new flag
>       need_peeling_for_alignment.
> ---
>  gcc/tree-vect-data-refs.cc  | 77 ++++++++++++++++++++++++++++++-------
>  gcc/tree-vect-loop-manip.cc |  6 ---
>  gcc/tree-vectorizer.h       |  5 +++
>  3 files changed, 68 insertions(+), 20 deletions(-)

Eh, where's the inline copy ...

@@ -739,15 +739,22 @@ vect_analyze_early_break_dependences (loop_vec_info 
loop_vinfo)
          if (DR_IS_READ (dr_ref)
              && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
            {
+             if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
+               {
+                 const char *msg

you want to add STMT_VINFO_STRIDED_P as well.

      /* Vector size in bytes.  */
+      poly_uint64 safe_align
+       = exact_div (tree_to_poly_uint64 (TYPE_SIZE (vectype)), 
BITS_PER_UNIT);

safe_align = TYPE_SIZE_UNIT (vectype);

+      /* Multiply by the unroll factor to get the number of bytes read
+        per vector iteration.  */
+      if (loop_vinfo)
+       {
+         auto num_copies = vect_get_num_copies (loop_vinfo, vectype);
+         gcc_checking_assert (pow2p_hwi (num_copies));
+         safe_align *= num_copies;

the unroll factor is the vectorization factor - I think the above goes
wrong for grouped accesses like an early break condition

 if (a[2*i] == a[2*i+1])

or so.  Thus, multiply by LOOP_VINFO_VECT_FACTOR (loop_vinfo).
Note this number doesn't need to be a power of two (and num_copies
above neither)

The rest of the patch looks good to me.

Richard.

Reply via email to