On 10/31/24 9:50 AM, Kyrylo Tkachov wrote:
Is there a reason why we don't have the target reject the vector rotation cases
it can't natively handle and the expander code would then try the rotate via
permuation?
It seems like your patch defers everything to the target which calls back into
the expander code for the special case.
There’s two reasons:
* In the cases I’ve been looking at the rotate is only detected in post-expand
RTL passes so the post-combine split is the first place where we even have the
chance to make that decision.
* On AArch64 once the rotate has been detected we want to give combine an
opportunity to propagate it further into a combined XOR+ROTATE instruction
(XAR) and only resort to what this patch is doing if no such combination was
made.
Fine reasons.
I think the generic rotate expansion code could use the new routine in the
future but the target needs to be heavily involved in the decision.
More broadly, the expansion sequence for these rotates on aarch64 can be:
* Emitting a shift-left + combined shift+accumulate
* For some rotate amounts it’s a single REV* instruction
* For targets that supports certain ISA extensions it’s a zeroing instruction +
XAR instruction
* For byte multiple rotate amounts it’s a permute mask creation + vector
permute.
So using something like RTX costs to guide these decisions would be somewhat
unwieldy.
Perhaps, but it's the right way to go if there are cases we can see at
expansion time. Better to generate good code early as long as it
doesn't inhibit later optimization.
Anyway, thanks for the answers. OK for the trunk.
jeff