Hi all, Some vector rotate operations can be implemented in a single instruction rather than using the fallback SHL+USRA sequence. In particular, when the rotate amount is half the bitwidth of the element we can use a REV64,REV32,REV16 instruction. This patch adds this transformation in the recently added splitter for vector rotates. Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com> gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate): Declare prototype. * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement. * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>): Call the above. gcc/testsuite/ * gcc.target/aarch64/simd/pr117048_2.c: New test.
v2-0004-aarch64-Optimize-vector-rotates-into-REV-instruction.patch
Description: v2-0004-aarch64-Optimize-vector-rotates-into-REV-instruction.patch