Hi all, Some vector rotate operations can be implemented in a single instruction rather than using the fallback SHL+USRA sequence. In particular, when the rotate amount is half the bitwidth of the element we can use a REV64,REV32,REV16 instruction. This patch adds this transformation in the recently added splitter for vector rotates. I've also received requests to optimise vector rotates by any amount that is a multiple of 8 into a TBL i.e. a vector permute, because the permute constant can be hoisted outside of hot paths and TBL instructions have high throughput on modern cores. It is an interesting idea, but as it's not strictly fewer instructions I have not implemented it here, but it's something to consider.
I'm also adding an expander for the rotl<mode>3 standard name. In some cases the vector rotate is detected very early on (even before GIMPLE?) For example when using GNU vector extensions: uint64x2_t G1 (uint64x2_t r) { return (r >> 32) | (r << 32); } This gets optimised into a r>>32 fairly early on. Because we do not have an expander for such vector rotates the expand pass synthesises it with RTL operations that end up generating the SHL+USRA sequence. It seems wasteful to expand it to multiple RTL ops only to then try to combine them back into a ROTATE during combine. Better to emit a simple ROTATE-by-vector-constant RTX to give the early RTL passes a chace to optimise it or combine it into something. Bootstrapped and tested on aarch64-none-linux-gnu. As with patch [2/3] interested in feedback on the approach. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate): Declare prototype. * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement. * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>): Call the above. (rotl<mode>3): New define_expand. gcc/testsuite/ * gcc.target/aarch64/simd/pr117048_2.c: New test.
0003-aarch64-Optimize-vector-rotates-into-REV-instruction.patch
Description: 0003-aarch64-Optimize-vector-rotates-into-REV-instruction.patch