On 21 Aug 2024, at 10:51, Sandhya Viswanathan wrote: > @jatin-bhateja Thanks, the PR ((https://github.com/openjdk/jdk/pull/20634) is > still work in progress and can be simplified much further. The changes I am > currently working on are do wrap by default for rearrange and selectFrom as > suggested by John and Paul, no additional api with boolean wrap as parameter, > and no changes to shuffle constructors.
Yes, thank you Sandhya; this is the destination I hope to arrive at. Not necessarily 100% in this PR, but this PR should be consistent with it. …To review: Shuffles store their indexes “partially wrapped” so as to preserve information about which indexes were out of bounds, but they also preserve all index values mod VLEN. It’s always an option, though not a requirement, to fully wrap, removing the OOB info and reducing every index down to 0..VLEN-1. When using a vector instead of a shuffle for steering, we think of this as creating a temp shuffle first, then doing the appropriate operation(s). But for best instruction selection, we have found that it’s fastest to force everything down to 0..VLEN-1 immediately, at least in the vector case, and to a doubled dynamic range, mod 2VLEN, for the two-input case. There’s always an equivalent expression which uses an explicit shuffle to carry either VLEN (fully wrapped) or 2VLEN (partially wrapped) indexes. For the vector-steered version we implement only the most favorable pattern of shuffle usage, one which never throws. And of course we don’t build a temp shuffle either.