On 12/8/25 12:47, Richard Biener wrote:
On Mon, Nov 24, 2025 at 10:21 PM Victor Do Nascimento
<[email protected]> wrote:
The iteration count profitability check is irrelevant for uncounted
loops given that, even at runtime, the number of iterations is unknown
at the start of loop execution.
Likewise, the test for skipping the vectorized version of the loop is
based on whether the number of iterations that will be run for the
loop and whether it is smaller than the vectorization factor. As this
is undetermined, the check can never be run for uncounted loops.
Consequently, we skip these checks.
Apart from the two other comments I made the series looks OK overall.
Please work with Tamar to get those bits sorted out.
On this patch I am now raising the cost model question - you skip any
computed profitability check in this patch, but shouldn't you instead
have made sure profitability is always there, thus no check is needed
in the first place? I'd have expected asserts that we do not run into
these code paths with uncounted loops.
How is the approach to avoid turning random uncounted loops into
vector loops and thus regressing performance a lot?
Did you perform any benchmark runs with the patch set in?
I ran SPEC CPU 2017 at various stages through development for AArch64
and found no regressions that were of any statistical significance.
I've run the equivalent benchmarks on x86_64, with similar results.
But about your concerns about regressions, to over-eagerly vectorize
loops where the resulting overhead is too expensive for short runs
and have the resulting loop execute unconditionally is runs counter to
goal of improved performance.
Consequently, knowing that no one benchmark will exhaustively represent
all real-life workloads, I have been thinking about how we can tweak the
cost model to effectively be more selective in the loops we vectorize,
given there's no check we can run on niters for profitability, even at
runtime.
Intuitively, my thought is to have some scalar penalty scaling factor we
can apply to the single iteration cost of the vectorized loop.
e.g. If we have a VF of 8, thus requiring that at the very least the
vectorized loop cost is no more than 8x the scalar cost, we can be more
stringent and half this threshold, such that unless the vectorized loop
is worth it even if the loop is only run for half a vector's length,
vectorization is still profitable. This would make the vectorized loop
more palatable for workloads where the loop is not often run for many
iterations.
While it is a fairly primitive approach, it may well serve as a positive
initial improvement to be made, which can be subsequently tweaked if
deemed too conservative.
Thanks,
Victor
Thanks,
Richard.
gcc/ChangeLog:
* tree-vect-loop-manip.cc (vect_loop_versioning): skip
profitability check for uncounted loops.
* tree-vect-loop-manip.cc (vect_do_peeling): Disable vector
loop skip checking.
---
gcc/tree-vect-loop-manip.cc | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3caaba30897..6bc474a3de0 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3289,11 +3289,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters,
tree nitersm1,
because we have asserted that there are enough scalar iterations to enter
the main loop, so this skip is not necessary. When we are versioning
then
we only add such a skip if we have chosen to vectorize the epilogue. */
- bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
- ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
- bound_prolog + bound_epilog)
- : (!LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING (loop_vinfo)
- || vect_epilogues));
+ bool skip_vector = false;
+ if (!uncounted_p)
+ skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
+ bound_prolog + bound_epilog)
+ : (!LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING (loop_vinfo)
+ || vect_epilogues));
/* Epilog loop must be executed if the number of iterations for epilog
loop is known at compile time, otherwise we need to add a check at
@@ -4152,13 +4154,14 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
tree version_simd_if_cond
= LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
+ bool uncounted_p = LOOP_VINFO_NITERS_UNCOUNTED_P (loop_vinfo);
- if (vect_apply_runtime_profitability_check_p (loop_vinfo)
+ if (!uncounted_p && vect_apply_runtime_profitability_check_p (loop_vinfo)
&& !ordered_p (th, versioning_threshold))
cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters),
th - 1));
- if (maybe_ne (versioning_threshold, 0U))
+ if (!uncounted_p && maybe_ne (versioning_threshold, 0U))
{
tree expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
build_int_cst (TREE_TYPE (scalar_loop_iters),
--
2.43.0