[RFC][AArch64] Defining lrotm3 optabs for SVE modes for TARGET_SVE2?

Kyrylo Tkachov via Gcc Thu, 17 Oct 2024 08:06:03 -0700

Hello,

I’ve been optimizing various code sequences relating to vector rotates recently.
I ended up proposing we expand the vector-rotate-by-immediate optab rotlm3 for
the Advanced SIMD (Neon) modes here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665635.html
This expands to a ROTATE RTL code that can be later combined into more complex
instructions like XAR and for certain rotate amounts can be optimized in a 
single instruction.
If they fail to be optimized then a splitter breaks it down into an SHL + USRA 
pair.


For SVE, because we have predicates in the general case it’s not feasible to 
detect
these rotates at the RTL level, so I was hoping that GIMPLE could do it, and 
indeed
GIMPLE has many places where it can detect rotate idioms: forwprop1, bswap 
detection,
pattern matching in the vectorizer, match.pd for simple cases etc.
The vectorizer is probably a good place to do it (rather than asking the other 
places to deal
with VLA types) but I think it would need the target to affirm that it supports 
SVE vector rotates
through the lrotm3 optab, hence my question. 

Though some rotate amounts can be implemented with a single instruction (REVB, 
REVH, REVW),
the fallback expansion for TARGET_SVE2 would be a two-instruction LSL+USRA 
which is better than
what we currently emit in the motivating test case:
https://godbolt.org/z/o55or8hYv
We currently cannot combine the LSL+LSR+ORR sequence because the predicates get 
in the way during
combine (even though the instructions involved are actually unpredicated and 
the predicate would get
dropped later anyway).
It would also allow us to keep an RTL-level ROTATE long enough to combine it 
into the XAR and RAX
instructions from TARGET_SVE2_SHA3.

Finally, it would allow us to experiment with more optimal SVE-specific rotate 
sequences in the future.
For example, we could consider emitting high-throughput TBLs for rotates that 
are a multiple of 8.

I’m suggesting doing this for TARGET_SVE2 as we have the combined USRA 
instruction there,
but I wouldn’t object doing this for TARGET_SVE.

Thanks,
Kyrill

[RFC][AArch64] Defining lrotm3 optabs for SVE modes for TARGET_SVE2?

Reply via email to