Hello, I’ve been optimizing various code sequences relating to vector rotates recently. I ended up proposing we expand the vector-rotate-by-immediate optab rotlm3 for the Advanced SIMD (Neon) modes here: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665635.html This expands to a ROTATE RTL code that can be later combined into more complex instructions like XAR and for certain rotate amounts can be optimized in a single instruction. If they fail to be optimized then a splitter breaks it down into an SHL + USRA pair.
For SVE, because we have predicates in the general case it’s not feasible to detect these rotates at the RTL level, so I was hoping that GIMPLE could do it, and indeed GIMPLE has many places where it can detect rotate idioms: forwprop1, bswap detection, pattern matching in the vectorizer, match.pd for simple cases etc. The vectorizer is probably a good place to do it (rather than asking the other places to deal with VLA types) but I think it would need the target to affirm that it supports SVE vector rotates through the lrotm3 optab, hence my question. Though some rotate amounts can be implemented with a single instruction (REVB, REVH, REVW), the fallback expansion for TARGET_SVE2 would be a two-instruction LSL+USRA which is better than what we currently emit in the motivating test case: https://godbolt.org/z/o55or8hYv We currently cannot combine the LSL+LSR+ORR sequence because the predicates get in the way during combine (even though the instructions involved are actually unpredicated and the predicate would get dropped later anyway). It would also allow us to keep an RTL-level ROTATE long enough to combine it into the XAR and RAX instructions from TARGET_SVE2_SHA3. Finally, it would allow us to experiment with more optimal SVE-specific rotate sequences in the future. For example, we could consider emitting high-throughput TBLs for rotates that are a multiple of 8. I’m suggesting doing this for TARGET_SVE2 as we have the combined USRA instruction there, but I wouldn’t object doing this for TARGET_SVE. Thanks, Kyrill