https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573

--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> ---
I'm testing a patch that basically does what Richi proposes.

I was also playing around with mixed lane configurations where we potentially
reuse the pointer increment from another pointer update.  To me the code looked
promising and I think we could at least make it work for a subset of lane
configurations.  I didn't manage to get everything correct, though so the patch
tries to only restore the status quo. 

Some info about vsetvl because the question also came up on the cauldron -
according to the vector spec it has the (for the compiler) annoying  property
that it can basically set the length freely within a certain range.  This is
for load-balancing reasons and intended to give hardware implementations more
freedom.  (I'm not sure that is a useful tradeoff as the compiler's freedom is
significantly reduced)

vsetvl takes the "application vector length" (AVL) so the total number of
elements the whole loop wants to process and returns a vl.
VLMAX is the maximum number of elements a single vector (or vector group with
LMUL) can hold.

If the AVL is larger than VLMAX but <= 2 * VLMAX vsetvl can set vl to a value
inside the range
[ceil(AVL / 2), VLMAX].
So for e.g. AVL = 37, ceil(37/2) = 19 would, unfortunately, be a legal vl
value.
For the other possible values of AVL (<= VLMAX, > 2*VLMAX) the behavior is as
expected.

My hope is that most hardware implementations would take a saner approach and
have vsetvl always act as a "min (AVL, VLMAX)".  That would enable easy scalar
evolution and would possible also allow mixed-lane settings with reuse of the
vl value.  I suppose we could have a target hook or target query mechanism that
asks for "sane" behavior of vsetvl?  Thus we could have optimized SELECT_VL
behavior for those implementations.

Reply via email to