Thanks for doing this.

Alex Coplan <alex.cop...@arm.com> writes:
> Hello,
>
> The attached patch adds an optimization to the AArch64 backend to catch
> additional cases where we can use csinv and csneg.
>
> Given the C code:
>
> unsigned long long inv(unsigned a, unsigned b, unsigned c)
> {
>   return a ? b : ~c;
> }
>
> Prior to this patch, AArch64 GCC at -O2 generates:
>
> inv:
>         cmp     w0, 0
>         mvn     w2, w2
>         csel    w0, w1, w2, ne
>         ret
>
> and after applying the patch, we get:
>
> inv:
>         cmp     w0, 0
>         csinv   w0, w1, w2, ne
>         ret
>
> The new pattern also catches the optimization for the symmetric case where the
> body of foo reads a ? ~b : c.

Yeah.  The thing that surprised me was that the non-extending form
has the operator in the "then" arm of the if_then_else:

(define_insn "*csinv3<mode>_insn"
  [(set (match_operand:GPI 0 "register_operand" "=r")
        (if_then_else:GPI
          (match_operand 1 "aarch64_comparison_operation" "")
          (not:GPI (match_operand:GPI 2 "register_operand" "r"))
          (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
  ""
  "csinv\\t%<w>0, %<w>3, %<w>2, %M1"
  [(set_attr "type" "csel")]
)

whereas the new pattern has it in the "else" arm.  I think for the
non-extending form, having the operator in the "then" arm really is
canonical and close to guaranteed, since that's how ifcvt processes
half diamonds.

But when both values are zero-extended, ifcvt has to convert a full
diamond, and I'm not sure we can rely on the two sides coming out in
a particular order.  I think the two ?: orders above work with one pattern
thanks to simplifications that happen before entering gimple.  If instead
the operator is split out into a separate statement:

unsigned long long
inv1(int a, unsigned b, unsigned c)
{
  unsigned d = ~c;
  return a ? b : d;
}

then we go back to using the less optimal CSEL sequence.

So I think it would be worth having a second pattern for the opposite order.
Alternatively, we could add a new rtl canonicalisation rule to force the
if_then_else operands to end up in a particular order, but that's more
complex and is likely to affect other targets.

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index c7c4d1dd519..2f7367c0b1a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4390,6 +4390,19 @@
>    [(set_attr "type" "csel")]
>  )
>  
> +(define_insn "*csinv3_uxtw_insn"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (if_then_else:DI
> +          (match_operand 1 "aarch64_comparison_operation" "")
> +          (zero_extend:DI
> +            (match_operand:SI 2 "aarch64_reg_or_zero" "rZ"))
> +          (zero_extend:DI
> +            (NEG_NOT:SI (match_operand:SI 3 "register_operand" "r")))))]
> +  ""
> +  "cs<neg_not_cs>\\t%w0, %w2, %w3, %m1"
> +  [(set_attr "type" "csel")]
> +)
> +

The operand to a zero_extend can never be a const_int, so operand 2
should just be a register_operand with an "r" constraint.

At the risk of feature creep :-) a useful third pattern could be
to combine a zero-extended operator result with an existing DImode value.
In that case, the existing DImode value really can be "rZ" and should
always be in the "else" arm of the if_then_else.  E.g.:

unsigned long long
f(int a, unsigned long b, unsigned c)
{
  return a ? b : ~c;
}

unsigned long long
g(int a, unsigned c)
{
  return a ? 0 : ~c;
}

But that's definitely something that can be left for later.

Thanks,
Richard

Reply via email to