Yeah...and I also don't like the magic "ceil(AVL / 2) ≤ vl ≤ VLMAX if
AVL < (2 * VLMAX)" rule...
+1, spec has some description about this but I am not sure if I really get the
point.
From Spec:
"For example, this permits an implementation to set vl = ceil(AVL
/ 2) for VLMAX < AVL < 2*VLMAX in order to evenly
distribute work over the last two iterations of a stripmine loop. Requirement
2 ensures that the rst stripmine iteration of reduction
loops uses the largest vector length of all iterations, even in the case of
AVL < 2*VLMAX. This allows software to avoid needing to
explicitly calculate a running maximum of vector lengths observed
during a stripmined loop. Requirement 2 also allows an
implementation to set vl to VLMAX for VLMAX < AVL < 2*VLMAX"
Yeah, that's very unfortunate.
The rule is something like
if AVL >= 2 * VLMAX
vl = vsetvl = min (AVL, VLMAX)
if VLMAX > AVL < 2 * VLMAX
vl = vsetvl = "whatever" ;)
if AVL <= VLMAX
vl = vsetvl = min (AVL, VLMAX)
The idea of load balancing is alright I guess but it really complicates matters
in the compiler.
FWIW my plan for GCC 16 is to define a SELECT_VL_SANE (or any better name I can
come up with) that doesn't have this behavior and always only performs a
minimum instead. This will allow us to perform scalar evolution on vsetvl
rather than giving up as we do right now. Microarchitectures where vsetvl
always behaves like a minimum would then enable the corresponding expander/insn
and others would fall back to the current behavior.
--
Regards
Robin