Pengfei Li <pengfei....@arm.com> writes: > This patch transforms RTL expressions of the form (subreg (not X) off) > into (not (subreg X off)) when the subreg is an operand of a bitwise AND > or OR. This transformation can expose opportunities to combine a NOT > operation with the bitwise AND/OR. > > For example, it improves the codegen of the following AArch64 NEON > intrinsics: > vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)), > vreinterpretq_s64_s32(b)); > from: > not v0.16b, v0.16b > and v0.16b, v0.16b, v1.16b > to: > bic v0.16b, v1.16b, v0.16b > > Regression tested on x86_64-linux-gnu, arm-linux-gnueabihf and > aarch64-linux-gnu. > > gcc/ChangeLog: > > * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): > Add RTX simplification for bitwise AND/OR. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/simd/bic_orn_1.c: New test.
Thanks for the patch. I agree we should have this simplification, but some comments below. > --- > gcc/simplify-rtx.cc | 24 +++++++++++++++++++ > .../gcc.target/aarch64/simd/bic_orn_1.c | 17 +++++++++++++ > 2 files changed, 41 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 88d31a71c05..ed620ef5d45 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -3738,6 +3738,18 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1)) > return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1); > > + /* Convert (ior (subreg (not X) off) Y) into (ior (not (subreg X off)) > Y) > + to expose opportunities to combine IOR and NOT. */ > + if (GET_CODE (op0) == SUBREG > + && GET_CODE (SUBREG_REG (op0)) == NOT) > + { > + rtx new_subreg = gen_rtx_SUBREG (mode, > + XEXP (SUBREG_REG (op0), 0), > + SUBREG_BYTE (op0)); > + rtx new_not = simplify_gen_unary (NOT, mode, new_subreg, mode); > + return simplify_gen_binary (IOR, mode, new_not, op1); > + } > + > tem = simplify_byte_swapping_operation (code, mode, op0, op1); > if (tem) > return tem; > @@ -4274,6 +4286,18 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP > (op0, 1)); > } > > + /* Convert (and (subreg (not X) off) Y) into (and (not (subreg X off)) > Y) > + to expose opportunities to combine AND and NOT. */ > + if (GET_CODE (op0) == SUBREG I think we should also check !paradoxical_subreg_p (op0). There are two reasons: (1) (and (subreg (something-narrower)) (const_int mask)) is a common zero-extension idiom. Pushing the subreg down into the first operand would prevent that. (2) Applying the rule in the paradoxical case would compute the NOT in a wider mode, which might be more expensive. > + && GET_CODE (SUBREG_REG (op0)) == NOT) > + { > + rtx new_subreg = gen_rtx_SUBREG (mode, > + XEXP (SUBREG_REG (op0), 0), > + SUBREG_BYTE (op0)); This should use simplify_gen_subreg instead of gen_rtx_SUBREG. > + rtx new_not = simplify_gen_unary (NOT, mode, new_subreg, mode); > + return simplify_gen_binary (AND, mode, new_not, op1); > + } > + The NOT might be on op1 rather than op0 in cases where the other operand is a nested AND, so I think we want to handle both op0 and op1 in the same way. We might as well do the same thing for XOR at the same time. It also seems worth splitting the code out into a subroutine, to avoid cut-&-paste and accidental divergence. Thanks, Richard > tem = simplify_byte_swapping_operation (code, mode, op0, op1); > if (tem) > return tem; > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c > b/gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c > new file mode 100644 > index 00000000000..1c66f21424e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/simd/bic_orn_1.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_neon.h> > + > +int64x2_t bic_16b (int32x4_t a, int32x4_t b) { > + return vandq_s64 (vreinterpretq_s64_s32 (vmvnq_s32 (a)), > + vreinterpretq_s64_s32 (b)); > +} > + > +int16x4_t orn_8b (int32x2_t a, int32x2_t b) { > + return vorr_s16 (vreinterpret_s16_s32 (a), > + vreinterpret_s16_s32 (vmvn_s32 (b))); > +} > + > +/* { dg-final { scan-assembler {\tbic\tv[0-9]+\.16b} } } */ > +/* { dg-final { scan-assembler {\torn\tv[0-9]+\.8b} } } */