Hi all,

Some vector rotate operations can be implemented in a single instruction
rather than using the fallback SHL+USRA sequence.
In particular, when the rotate amount is half the bitwidth of the element
we can use a REV64,REV32,REV16 instruction.
This patch adds this transformation in the recently added splitter for vector
rotates.
Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>

gcc/

        * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate):
        Declare prototype.
        * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement.
        * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>):
        Call the above.

gcc/testsuite/

        * gcc.target/aarch64/simd/pr117048_2.c: New test.

Attachment: v2-0004-aarch64-Optimize-vector-rotates-into-REV-instruction.patch
Description: v2-0004-aarch64-Optimize-vector-rotates-into-REV-instruction.patch

Reply via email to