https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573
--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> --- I'm testing a patch that basically does what Richi proposes. I was also playing around with mixed lane configurations where we potentially reuse the pointer increment from another pointer update. To me the code looked promising and I think we could at least make it work for a subset of lane configurations. I didn't manage to get everything correct, though so the patch tries to only restore the status quo. Some info about vsetvl because the question also came up on the cauldron - according to the vector spec it has the (for the compiler) annoying property that it can basically set the length freely within a certain range. This is for load-balancing reasons and intended to give hardware implementations more freedom. (I'm not sure that is a useful tradeoff as the compiler's freedom is significantly reduced) vsetvl takes the "application vector length" (AVL) so the total number of elements the whole loop wants to process and returns a vl. VLMAX is the maximum number of elements a single vector (or vector group with LMUL) can hold. If the AVL is larger than VLMAX but <= 2 * VLMAX vsetvl can set vl to a value inside the range [ceil(AVL / 2), VLMAX]. So for e.g. AVL = 37, ceil(37/2) = 19 would, unfortunately, be a legal vl value. For the other possible values of AVL (<= VLMAX, > 2*VLMAX) the behavior is as expected. My hope is that most hardware implementations would take a saner approach and have vsetvl always act as a "min (AVL, VLMAX)". That would enable easy scalar evolution and would possible also allow mixed-lane settings with reuse of the vl value. I suppose we could have a target hook or target query mechanism that asks for "sane" behavior of vsetvl? Thus we could have optimized SELECT_VL behavior for those implementations.