> But the vector type we perform the permutation on should be unchanged (it's
> not the punned type but the original type we pun the loaded vector back to)?

Yeah, I was trying to re-use what we have but I see now that just passing a 
different vectype to vect_transform_slp_perm_load doesn't work in all cases.

But apart from that I cannot think of a good or canonical way of achieving the
"filtering" I want.  The high-level picture is that every node only accesses a 
contiguous part of the group which is represented in the load perm.

I guess a more orthodox way would be to try to pun the whole group (of size 8 
here) with a vector element instead of just the number of SLP lanes.  Right now 
it just fits "by accident".  Then introduce load-permutation handling for the 
result.  That would also involve adjusting ncopies like in the VMAT_STRIDED_SLP 
case (thus making gather/scatter more similar to VMAT_STRIDED_SLP) but is 
eventually doable.

In the end we'd have 2x the number of loads with larger element size in my 
example that would be needed to permute into place.  Even with that we'd arrive 
at a point where we would want to recognize that only half, quarter, etc. of a 
group is actually used in a node and adjust the pun element-size accordingly.

So I'm not sure there is a way of recognizing this from just the group or the 
gap or another property.  If there is I would be glad to use it but all I can 
come up with is actually inspecting the load permutation per node.  When it is 
monotonic/contiguous we can pun more efficiently so to say.  Otherwise we need 
to "capture" the whole group with a punned element.

-- 
Regards
 Robin

Reply via email to