On Thu, Dec 11, 2025 at 3:17 PM Victor Do Nascimento
<[email protected]> wrote:
>
> On 12/8/25 12:47, Richard Biener wrote:
> > On Mon, Nov 24, 2025 at 10:21 PM Victor Do Nascimento
> > <[email protected]> wrote:
> >>
> >> The iteration count profitability check is irrelevant for uncounted
> >> loops given that, even at runtime, the number of iterations is unknown
> >> at the start of loop execution.
> >>
> >> Likewise, the test for skipping the vectorized version of the loop is
> >> based on whether the number of iterations that will be run for the
> >> loop and whether it is smaller than the vectorization factor. As this
> >> is undetermined, the check can never be run for uncounted loops.
> >>
> >> Consequently, we skip these checks.
> >
> > Apart from the two other comments I made the series looks OK overall.
> > Please work with Tamar to get those bits sorted out.
> >
> > On this patch I am now raising the cost model question - you skip any
> > computed profitability check in this patch, but shouldn't you instead
> > have made sure profitability is always there, thus no check is needed
> > in the first place? I'd have expected asserts that we do not run into
> > these code paths with uncounted loops.
> >
> > How is the approach to avoid turning random uncounted loops into
> > vector loops and thus regressing performance a lot?
> >
> > Did you perform any benchmark runs with the patch set in?
>
> I ran SPEC CPU 2017 at various stages through development for AArch64
> and found no regressions that were of any statistical significance.
> I've run the equivalent benchmarks on x86_64, with similar results.
>
> But about your concerns about regressions, to over-eagerly vectorize
> loops where the resulting overhead is too expensive for short runs
> and have the resulting loop execute unconditionally is runs counter to
> goal of improved performance.
>
> Consequently, knowing that no one benchmark will exhaustively represent
> all real-life workloads, I have been thinking about how we can tweak the
> cost model to effectively be more selective in the loops we vectorize,
> given there's no check we can run on niters for profitability, even at
> runtime.
>
> Intuitively, my thought is to have some scalar penalty scaling factor we
> can apply to the single iteration cost of the vectorized loop.
>
> e.g. If we have a VF of 8, thus requiring that at the very least the
> vectorized loop cost is no more than 8x the scalar cost, we can be more
> stringent and half this threshold, such that unless the vectorized loop
> is worth it even if the loop is only run for half a vector's length,
> vectorization is still profitable. This would make the vectorized loop
> more palatable for workloads where the loop is not often run for many
> iterations.
>
> While it is a fairly primitive approach, it may well serve as a positive
> initial improvement to be made, which can be subsequently tweaked if
> deemed too conservative.
Yes, I thought of something like this, but more aggressive to require
the vector loop being less than 2x expensive as a single scalar iteration ;)
I guess we can for now leave this up to targets which can check
for the uncounted loop case in the finish_cost hook. But I'd say we want
to make sure the 'cheap' cost model (aka what we do at -O2 by default)
doesn't vectorize uncounted loops? OTOH at this point we can also
wait and see so we get actual problematic cases and adjust costing
during stage4.
I believe apart from the ChangeLog/comment issue the patch series
is ready to commit.
Thanks,
Richard.
>
> Thanks,
> Victor
>
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-loop-manip.cc (vect_loop_versioning): skip
> >> profitability check for uncounted loops.
> >> * tree-vect-loop-manip.cc (vect_do_peeling): Disable vector
> >> loop skip checking.
> >> ---
> >> gcc/tree-vect-loop-manip.cc | 17 ++++++++++-------
> >> 1 file changed, 10 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> >> index 3caaba30897..6bc474a3de0 100644
> >> --- a/gcc/tree-vect-loop-manip.cc
> >> +++ b/gcc/tree-vect-loop-manip.cc
> >> @@ -3289,11 +3289,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> >> niters, tree nitersm1,
> >> because we have asserted that there are enough scalar iterations to
> >> enter
> >> the main loop, so this skip is not necessary. When we are
> >> versioning then
> >> we only add such a skip if we have chosen to vectorize the
> >> epilogue. */
> >> - bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >> - ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >> - bound_prolog + bound_epilog)
> >> - : (!LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING
> >> (loop_vinfo)
> >> - || vect_epilogues));
> >> + bool skip_vector = false;
> >> + if (!uncounted_p)
> >> + skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >> + ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >> + bound_prolog + bound_epilog)
> >> + : (!LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING
> >> (loop_vinfo)
> >> + || vect_epilogues));
> >>
> >> /* Epilog loop must be executed if the number of iterations for epilog
> >> loop is known at compile time, otherwise we need to add a check at
> >> @@ -4152,13 +4154,14 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
> >> tree version_simd_if_cond
> >> = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
> >> unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> >> + bool uncounted_p = LOOP_VINFO_NITERS_UNCOUNTED_P (loop_vinfo);
> >>
> >> - if (vect_apply_runtime_profitability_check_p (loop_vinfo)
> >> + if (!uncounted_p && vect_apply_runtime_profitability_check_p
> >> (loop_vinfo)
> >> && !ordered_p (th, versioning_threshold))
> >> cond_expr = fold_build2 (GE_EXPR, boolean_type_node,
> >> scalar_loop_iters,
> >> build_int_cst (TREE_TYPE (scalar_loop_iters),
> >> th - 1));
> >> - if (maybe_ne (versioning_threshold, 0U))
> >> + if (!uncounted_p && maybe_ne (versioning_threshold, 0U))
> >> {
> >> tree expr = fold_build2 (GE_EXPR, boolean_type_node,
> >> scalar_loop_iters,
> >> build_int_cst (TREE_TYPE
> >> (scalar_loop_iters),
> >> --
> >> 2.43.0
> >>
>