On 21 Aug 2024, at 11:30, Paul Sandoz wrote: > Is it possible for the intrinsic to be responsible for wrapping, if needed? > If was looking at > [`vpermi2b`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpermi2b&ig_expand=4917,4982,5004,5010,5014&techs=AVX_512) > and AFAICT it implicitly wraps, operating on the lower N bits. Is that > correct?
That’s not a bad idea. But it is also possible (and routine) for the JIT to take an expression like (i >> (j&31)) down to (i >> j) if the hardware takes care of the (j&31) inside its >> operation. I think that some hardware permutation operations do something similar to >> in that they simply ignore irrelevant bits in the steering indexes. (Other operations do exotic things with irrelevant bits, such as interpreting the sign bit as a command to “force this one to zero”.) If the wrapping operation for steering indexes is just a vpand against a simple constant, then maybe (maybe!) the JIT can easily drop that vpand, when the input is passed to a friendly auto-masking instruction, just like with (i >> (j&31)). On the other hand, Paul’s idea might be more robust. It would require that the permutation intrinsics would apply vpand at the right places, and omit vpand when possible. On the other other hand (the first hand) the classic way of doing it doesn’t introduce vpand inside of intrinsics, which has a routine advantage: The vpands introduced outside of the intrinsic can be user-introduced or framework-introduced or both. In all cases, the JIT treats them uniformly and can collapse them together. Putting magic fixup instructions inside of intrinsic expansion risks making them invisible to the routine optimizations of the JIT. So, assuming the vpand gets good optimization, putting it outside of the intrinsic is the most robust option, as long as “good optimization” includes the >>(j&31) trick for auto-masking instructions. So the intrinsic should look for a vpand in its steering input, and pop off the IR node if the hardware masking is found to produce the same result.