Yeah...and I also don't like the magic "ceil(AVL / 2) ≤ vl ≤ VLMAX if
AVL < (2 * VLMAX)" rule...

+1, spec has some description about this but I am not sure if I really get the 
point.

From Spec:

"For example, this permits an implementation to set vl = ceil(AVL / 2) for VLMAX < AVL < 2*VLMAX in order to evenly distribute work over the last two iterations of a stripmine loop. Requirement 2 ensures that the rst stripmine iteration of reduction loops uses the largest vector length of all iterations, even in the case of AVL < 2*VLMAX. This allows software to avoid needing to explicitly calculate a running maximum of vector lengths observed during a stripmined loop. Requirement 2 also allows an
implementation to set vl to VLMAX for VLMAX < AVL < 2*VLMAX"

Yeah, that's very unfortunate.

The rule is something like
if AVL >= 2 * VLMAX
   vl = vsetvl = min (AVL, VLMAX)

 if VLMAX > AVL < 2 * VLMAX
   vl = vsetvl = "whatever" ;)

 if AVL <= VLMAX
   vl = vsetvl = min (AVL, VLMAX)

The idea of load balancing is alright I guess but it really complicates matters in the compiler.

FWIW my plan for GCC 16 is to define a SELECT_VL_SANE (or any better name I can come up with) that doesn't have this behavior and always only performs a minimum instead. This will allow us to perform scalar evolution on vsetvl rather than giving up as we do right now. Microarchitectures where vsetvl always behaves like a minimum would then enable the corresponding expander/insn and others would fall back to the current behavior.

--
Regards
Robin

Reply via email to