On Fri, 15 Nov 2024, Alex Coplan wrote:

> Hi,
> 
> This is a v2 which hopefully addresses the feedback for v1 of the 1/5
> patch, originally posted here:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666648.html
> 
> As mentioned on IRC, it will need follow-up work to fix up latent
> profile issues, but that can be done during stage 3.  We will also need
> a simple (hopefully obvious, even) follow-up patch to fix expectations
> for various tests (since we now vectorize loops which we previously
> couldn't).
> 
> OK for trunk?

I'm still looking at

+  if (dr_info->need_peeling_for_alignment)
+    {
+      /* Vector size in bytes.  */
+      poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
(vectype));
+
+      /* We can only peel for loops, of course.  */
+      gcc_checking_assert (loop_vinfo);
+
+      /* Calculate the number of vectors read per vector iteration.  If
+        it is a power of two, multiply through to get the required
+        alignment in bytes.  Otherwise, fail analysis since alignment
+        peeling wouldn't work in such a case.  */
+      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+      if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+       vf *= DR_GROUP_SIZE (stmt_info);

so this is the total number of scalars we load, so maybe call it
that way, num_scalars.

+
+      auto num_vectors = vect_get_num_vectors (vf, vectype);
+      if (!pow2p_hwi (num_vectors))

side-note - with all these multiplies there's the possibility that
we get a testcase that has safe_align > PAGE_SIZE, meaning it's
no longer a good way to avoid trapping.  This problem of course
exists generally, we avoid it elsewhere by not having very large
vectors or limiting the group-size.  The actual problem is that
we don't know the actual page size, but we maybe could configure
a minimum as part of the platform configuration.  Should we for
now simply add

  || safe_align > 4096

here?  A testcase would load 512 contiguous uint64 to form an early exit
condition, quite unlikely I guess.

With DR_GROUP_SIZE != 1 there's also the question whether we can
ever reach the desired alignment since each peel will skip
DR_GROUP_SIZE scalar elements - either those are already
DR_GROUP_SIZE aligned or the result will never be.

@@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
dr_vec_info *dr_info,
   if (misalignment == DR_MISALIGNMENT_UNKNOWN)
     is_packed = not_size_aligned (DR_REF (dr));
   if (targetm.vectorize.support_vector_misalignment (mode, type, 
misalignment,
-                                                    is_packed))
+                                                    is_packed)
+      && !dr_info->need_peeling_for_alignment)
     return dr_unaligned_supported;

I think you need to do this earlier, like with

  if (misalignment == 0)
    return dr_aligned;
+ else if (dr_info->need_peeling_for_alignment)
+   return dr_unaligned_unsupported;

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index c8dc7153298..be2c2a1bc75 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
niters, tree nitersm1,
   int estimated_vf;
   int prolog_peeling = 0;
   bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL;
-  /* We currently do not support prolog peeling if the target alignment 
is not
-     known at compile time.  'vect_gen_prolog_loop_niters' depends on the
-     target alignment being constant.  */
-  dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
-  if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ())
-    return NULL;

this indeed looks like an odd (wrong) place to enforce this, so I
suppose you figured the check is not needed, independent of the patch?

OK with the above changes.

Thanks,
Richard.


> Thanks,
> Alex
> 
> -- >8 --
> 
> This allows us to vectorize more loops with early exits by forcing
> peeling for alignment to make sure that we're guaranteed to be able to
> safely read an entire vector iteration without crossing a page boundary.
> 
> To make this work for VLA architectures we have to allow compile-time
> non-constant target alignments.  We also have to override the result of
> the target's preferred_vector_alignment hook if it isn't already a
> power-of-two multiple of the amount read per vector iteration.
> 
> gcc/ChangeLog:
> 
>       * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
>       Set need_peeling_for_alignment flag on read DRs instead of
>       failing vectorization.  Punt on gathers and strided_p accesses.
>       (dr_misalignment): Handle non-constant target alignments.
>       (vect_compute_data_ref_alignment): If need_peeling_for_alignment
>       flag is set on the DR, then override the target alignment chosen
>       by the preferred_vector_alignment hook to choose a safe
>       alignment.  Add new result parameter.  Use it ...
>       (vect_analyze_data_refs_alignment): ... here, and fail
>       analysis if vect_compute_data_ref_alignment sets it to a
>       failure.
>       (vect_supportable_dr_alignment): Override
>       support_vector_misalignment hook if need_peeling_for_alignment
>       is set on the DR: in this case we must return
>       dr_unaligned_unsupported in order to force peeling.
>       * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
>       peeling by a compile-time non-constant amount.
>       * tree-vectorizer.h (dr_vec_info): Add new flag
>       need_peeling_for_alignment.
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to