On 10/27/24 10:21 AM, Kyrylo Tkachov wrote:
Hi all,

simplify-rtx can transform (X << C1) | (X >> C2) into ROTATE (X, C1) when
C1 + C2 == mode-width.  But the transformation is also valid for PLUS and XOR.
Indeed GIMPLE can also do the fold.  Let's teach RTL to do it too.

The motivating testcase for this is in AArch64 intrinsics:

uint64x2_t G2(uint64x2_t a, uint64x2_t b) {
     uint64x2_t c = veorq_u64(a, b);
     return veorq_u64(vaddq_u64(c, c), vshrq_n_u64(c, 63));
}

which I was hoping to fold to a single XAR (a ROTATE+XOR instruction) but
GCC was failing to detect the rotate operation for two reasons:
1) The combination of the two arms of the expression is done under XOR rather
than IOR that simplify-rtx currently supports.
2) The ASHIFT operation is actually a (PLUS X X) operation and thus is not
detected as the LHS of the two arms we require.

The patch fixes both issues.  The analysis of the two arms of the rotation
expression is factored out into a common helper simplify_rotate which is
then used in the PLUS, XOR, IOR cases in simplify_binary_operation_1.

The check-assembly testcase for this is added in the following patch because
it needs some extra AArch64 backend work, but I've added self-tests in this
patch to validate the transformation.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov<ktac...@nvidia.com>

        PR target/117048
        * simplify-rtx.cc (extract_ashift_operands_p): Define.
        (simplify_rotate_op): Likewise.
        (simplify_context::simplify_binary_operation_1): Use the above in
        the PLUS, IOR, XOR cases.
        (test_vector_rotate): Define.
        (test_vector_ops): Use the above.
OK
jeff

Reply via email to