Thanks for doing this. Alex Coplan <alex.cop...@arm.com> writes: > Hello, > > The attached patch adds an optimization to the AArch64 backend to catch > additional cases where we can use csinv and csneg. > > Given the C code: > > unsigned long long inv(unsigned a, unsigned b, unsigned c) > { > return a ? b : ~c; > } > > Prior to this patch, AArch64 GCC at -O2 generates: > > inv: > cmp w0, 0 > mvn w2, w2 > csel w0, w1, w2, ne > ret > > and after applying the patch, we get: > > inv: > cmp w0, 0 > csinv w0, w1, w2, ne > ret > > The new pattern also catches the optimization for the symmetric case where the > body of foo reads a ? ~b : c.
Yeah. The thing that surprised me was that the non-extending form has the operator in the "then" arm of the if_then_else: (define_insn "*csinv3<mode>_insn" [(set (match_operand:GPI 0 "register_operand" "=r") (if_then_else:GPI (match_operand 1 "aarch64_comparison_operation" "") (not:GPI (match_operand:GPI 2 "register_operand" "r")) (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))] "" "csinv\\t%<w>0, %<w>3, %<w>2, %M1" [(set_attr "type" "csel")] ) whereas the new pattern has it in the "else" arm. I think for the non-extending form, having the operator in the "then" arm really is canonical and close to guaranteed, since that's how ifcvt processes half diamonds. But when both values are zero-extended, ifcvt has to convert a full diamond, and I'm not sure we can rely on the two sides coming out in a particular order. I think the two ?: orders above work with one pattern thanks to simplifications that happen before entering gimple. If instead the operator is split out into a separate statement: unsigned long long inv1(int a, unsigned b, unsigned c) { unsigned d = ~c; return a ? b : d; } then we go back to using the less optimal CSEL sequence. So I think it would be worth having a second pattern for the opposite order. Alternatively, we could add a new rtl canonicalisation rule to force the if_then_else operands to end up in a particular order, but that's more complex and is likely to affect other targets. > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c7c4d1dd519..2f7367c0b1a 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4390,6 +4390,19 @@ > [(set_attr "type" "csel")] > ) > > +(define_insn "*csinv3_uxtw_insn" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (if_then_else:DI > + (match_operand 1 "aarch64_comparison_operation" "") > + (zero_extend:DI > + (match_operand:SI 2 "aarch64_reg_or_zero" "rZ")) > + (zero_extend:DI > + (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))] > + "" > + "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1" > + [(set_attr "type" "csel")] > +) > + The operand to a zero_extend can never be a const_int, so operand 2 should just be a register_operand with an "r" constraint. At the risk of feature creep :-) a useful third pattern could be to combine a zero-extended operator result with an existing DImode value. In that case, the existing DImode value really can be "rZ" and should always be in the "else" arm of the if_then_else. E.g.: unsigned long long f(int a, unsigned long b, unsigned c) { return a ? b : ~c; } unsigned long long g(int a, unsigned c) { return a ? 0 : ~c; } But that's definitely something that can be left for later. Thanks, Richard