Re: [PATCH v2 07/14] vect: Disable niters-based skipping of uncounted vectorized loops

Richard Biener Fri, 12 Dec 2025 06:46:00 -0800

On Thu, Dec 11, 2025 at 3:17 PM Victor Do Nascimento
<[email protected]> wrote:
>
> On 12/8/25 12:47, Richard Biener wrote:
> > On Mon, Nov 24, 2025 at 10:21 PM Victor Do Nascimento
> > <[email protected]> wrote:
> >>
> >> The iteration count profitability check is irrelevant for uncounted
> >> loops given that, even at runtime, the number of iterations is unknown
> >> at the start of loop execution.
> >>
> >> Likewise, the test for skipping the vectorized version of the loop is
> >> based on whether the number of iterations that will be run for the
> >> loop and whether it is smaller than the vectorization factor.  As this
> >> is undetermined, the check can never be run for uncounted loops.
> >>
> >> Consequently, we skip these checks.
> >
> > Apart from the two other comments I made the series looks OK overall.
> > Please work with Tamar to get those bits sorted out.
> >
> > On this patch I am now raising the cost model question - you skip any
> > computed profitability check in this patch, but shouldn't you instead
> > have made sure profitability is always there, thus no check is needed
> > in the first place?  I'd have expected asserts that we do not run into
> > these code paths with uncounted loops.
> >
> > How is the approach to avoid turning random uncounted loops into
> > vector loops and thus regressing performance a lot?
> >
> > Did you perform any benchmark runs with the patch set in?
>
> I ran SPEC CPU 2017 at various stages through development for AArch64
> and found no regressions that were of any statistical significance.
> I've run the equivalent benchmarks on x86_64, with similar results.
>
> But about your concerns about regressions, to over-eagerly vectorize
> loops where the resulting overhead is too expensive for short runs
> and have the resulting loop execute unconditionally is runs counter to
> goal of improved performance.
>
> Consequently, knowing that no one benchmark will exhaustively represent
> all real-life workloads, I have been thinking about how we can tweak the
> cost model to effectively be more selective in the loops we vectorize,
> given there's no check we can run on niters for profitability, even at
> runtime.
>
> Intuitively, my thought is to have some scalar penalty scaling factor we
> can apply to the single iteration cost of the vectorized loop.
>
> e.g. If we have a VF of 8, thus requiring that at the very least the
> vectorized loop cost is no more than 8x the scalar cost, we can be more
> stringent and half this threshold, such that unless the vectorized loop
> is worth it even if the loop is only run for half a vector's length,
> vectorization is still profitable.  This would make the vectorized loop
> more palatable for workloads where the loop is not often run for many
> iterations.
>
> While it is a fairly primitive approach, it may well serve as a positive
> initial improvement to be made, which can be subsequently tweaked if
> deemed too conservative.


Yes, I thought of something like this, but more aggressive to require
the vector loop being less than 2x expensive as a single scalar iteration ;)
I guess we can for now leave this up to targets which can check
for the uncounted loop case in the finish_cost hook.  But I'd say we want
to make sure the 'cheap' cost model (aka what we do at -O2 by default)
doesn't vectorize uncounted loops?  OTOH at this point we can also
wait and see so we get actual problematic cases and adjust costing
during stage4.

I believe apart from the ChangeLog/comment issue the patch series
is ready to commit.

Thanks,
Richard.

>
> Thanks,
> Victor
>
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >>
> >>          * tree-vect-loop-manip.cc (vect_loop_versioning): skip
> >>          profitability check for uncounted loops.
> >>          * tree-vect-loop-manip.cc (vect_do_peeling): Disable vector
> >>          loop skip checking.
> >> ---
> >>   gcc/tree-vect-loop-manip.cc | 17 ++++++++++-------
> >>   1 file changed, 10 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> >> index 3caaba30897..6bc474a3de0 100644
> >> --- a/gcc/tree-vect-loop-manip.cc
> >> +++ b/gcc/tree-vect-loop-manip.cc
> >> @@ -3289,11 +3289,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> >> niters, tree nitersm1,
> >>        because we have asserted that there are enough scalar iterations to 
> >> enter
> >>        the main loop, so this skip is not necessary.  When we are 
> >> versioning then
> >>        we only add such a skip if we have chosen to vectorize the 
> >> epilogue.  */
> >> -  bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >> -                     ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >> -                                 bound_prolog + bound_epilog)
> >> -                     : (!LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING 
> >> (loop_vinfo)
> >> -                        || vect_epilogues));
> >> +  bool skip_vector = false;
> >> +  if (!uncounted_p)
> >> +    skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >> +                  ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >> +                              bound_prolog + bound_epilog)
> >> +                  : (!LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING 
> >> (loop_vinfo)
> >> +                     || vect_epilogues));
> >>
> >>     /* Epilog loop must be executed if the number of iterations for epilog
> >>        loop is known at compile time, otherwise we need to add a check at
> >> @@ -4152,13 +4154,14 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> >>     tree version_simd_if_cond
> >>       = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
> >>     unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> >> +  bool uncounted_p = LOOP_VINFO_NITERS_UNCOUNTED_P (loop_vinfo);
> >>
> >> -  if (vect_apply_runtime_profitability_check_p (loop_vinfo)
> >> +  if (!uncounted_p && vect_apply_runtime_profitability_check_p 
> >> (loop_vinfo)
> >>         && !ordered_p (th, versioning_threshold))
> >>       cond_expr = fold_build2 (GE_EXPR, boolean_type_node, 
> >> scalar_loop_iters,
> >>                               build_int_cst (TREE_TYPE (scalar_loop_iters),
> >>                                              th - 1));
> >> -  if (maybe_ne (versioning_threshold, 0U))
> >> +  if (!uncounted_p && maybe_ne (versioning_threshold, 0U))
> >>       {
> >>         tree expr = fold_build2 (GE_EXPR, boolean_type_node, 
> >> scalar_loop_iters,
> >>                                 build_int_cst (TREE_TYPE 
> >> (scalar_loop_iters),
> >> --
> >> 2.43.0
> >>
>

Re: [PATCH v2 07/14] vect: Disable niters-based skipping of uncounted vectorized loops

Reply via email to