https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100077
--- Comment #2 from Michael Matz <matz at gcc dot gnu.org> --- Yeah, to solve this fully requires representing the parameter passing in a better way, one that can be (a) used on the gimple side (where the code is already generated assuming the vec3a params go into memory) and (b) is surviving the gimple to RTL switch (or at least is used during that switch to find a better expansion of the parameter into register loads (using shuffles in this case)). No easy fix :-/ (Note: in normal programs such kernels should be inlined into whatever uses the basic operations in loops, at which point this particular problem of parameter passing artifacts simply goes away, so it's visible only in micro tests. It's still a problem of course)