> But the vector type we perform the permutation on should be unchanged (it's > not the punned type but the original type we pun the loaded vector back to)?
Yeah, I was trying to re-use what we have but I see now that just passing a different vectype to vect_transform_slp_perm_load doesn't work in all cases. But apart from that I cannot think of a good or canonical way of achieving the "filtering" I want. The high-level picture is that every node only accesses a contiguous part of the group which is represented in the load perm. I guess a more orthodox way would be to try to pun the whole group (of size 8 here) with a vector element instead of just the number of SLP lanes. Right now it just fits "by accident". Then introduce load-permutation handling for the result. That would also involve adjusting ncopies like in the VMAT_STRIDED_SLP case (thus making gather/scatter more similar to VMAT_STRIDED_SLP) but is eventually doable. In the end we'd have 2x the number of loads with larger element size in my example that would be needed to permute into place. Even with that we'd arrive at a point where we would want to recognize that only half, quarter, etc. of a group is actually used in a node and adjust the pun element-size accordingly. So I'm not sure there is a way of recognizing this from just the group or the gap or another property. If there is I would be glad to use it but all I can come up with is actually inspecting the load permutation per node. When it is monotonic/contiguous we can pun more efficiently so to say. Otherwise we need to "capture" the whole group with a punned element. -- Regards Robin