On 10/31/24 9:50 AM, Kyrylo Tkachov wrote:

Is there a reason why we don't have the target reject the vector rotation cases 
it can't natively handle and the expander code would then try the rotate via 
permuation?

It seems like your patch defers everything to the target which calls back into 
the expander code for the special case.

There’s two reasons:
* In the cases I’ve been looking at the rotate is only detected in post-expand 
RTL passes so the post-combine split is the first place where we even have the 
chance to make that decision.
* On AArch64 once the rotate has been detected we want to give combine an 
opportunity to propagate it further into a combined XOR+ROTATE instruction 
(XAR) and only resort to what this patch is doing if no such combination was 
made.
Fine reasons.


I think the generic rotate expansion code could use the new routine in the 
future but the target needs to be heavily involved in the decision.
More broadly, the expansion sequence for these rotates on aarch64 can be:
* Emitting a shift-left + combined shift+accumulate
* For some rotate amounts it’s a single REV* instruction
* For targets that supports certain ISA extensions it’s a zeroing instruction + 
XAR instruction
* For byte multiple rotate amounts it’s a permute mask creation + vector 
permute.

So using something like RTX costs to guide these decisions would be somewhat 
unwieldy.
Perhaps, but it's the right way to go if there are cases we can see at expansion time. Better to generate good code early as long as it doesn't inhibit later optimization.

Anyway, thanks for the answers.  OK for the trunk.

jeff

Reply via email to