[PATCH 3/3] aarch64: Optimize vector rotates into REV* instructions where possible

Kyrylo Tkachov Wed, 16 Oct 2024 06:57:24 -0700

Hi all,

Some vector rotate operations can be implemented in a single instruction
rather than using the fallback SHL+USRA sequence.
In particular, when the rotate amount is half the bitwidth of the element
we can use a REV64,REV32,REV16 instruction.
This patch adds this transformation in the recently added splitter for vector
rotates.  I've also received requests to optimise vector rotates by any amount
that is a multiple of 8 into a TBL i.e. a vector permute, because the permute
constant can be hoisted outside of hot paths and TBL instructions have high
throughput on modern cores.  It is an interesting idea, but as it's not
strictly fewer instructions I have not implemented it here, but it's something
to consider.


I'm also adding an expander for the rotl<mode>3 standard name.
In some cases the vector rotate is detected very early on (even before GIMPLE?)
For example when using GNU vector extensions:
uint64x2_t G1 (uint64x2_t r) {
    return (r >> 32) | (r << 32);
}
This gets optimised into a r>>32 fairly early on.  Because we do not have an
expander for such vector rotates the expand pass synthesises it with RTL
operations that end up generating the SHL+USRA sequence.  It seems wasteful
to expand it to multiple RTL ops only to then try to combine them back into
a ROTATE during combine.  Better to emit a simple ROTATE-by-vector-constant
RTX to give the early RTL passes a chace to optimise it or combine it into
something.

Bootstrapped and tested on aarch64-none-linux-gnu.
As with patch [2/3] interested in feedback on the approach.

Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>

gcc/

        * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate):
        Declare prototype.
        * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement.
        * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>):
        Call the above.
        (rotl<mode>3): New define_expand.

gcc/testsuite/

        * gcc.target/aarch64/simd/pr117048_2.c: New test.

0003-aarch64-Optimize-vector-rotates-into-REV-instruction.patch
Description: 0003-aarch64-Optimize-vector-rotates-into-REV-instruction.patch

[PATCH 3/3] aarch64: Optimize vector rotates into REV* instructions where possible

Reply via email to