I was just looking back through some old emails and saw that this was never 
reviewed.

On 26/04/2024 18:37, Andrew Pinski wrote:
> On many cores, the mov instruction is "free" so the sequence:
>         cmp     w0, #0
>         cset    w0, ne
>         add     w0, w0, 42
> is more expensive than just:
>         cmp     w0, #0
>         mov     w1, #42
>         cinc    w0, w1, ne
> 
> The reason why we get the add case is that the pattern csinc2<mode>_insn
> only accepts registers for the predicate and we so we don't get an cinc
> without that. The small change to the predicate of using general_operand
> instead allows the combine to match and then the mov is generated with
> the register allocator checks the constraints.

The main advantage of the new sequence is that the latency is reduced to two 
instructions on dual-issue or above machines: the cmp and the mov can execute 
in parallel.

But there's a downside to your solution: it invisibly adds some register 
pressure which isn't exposed until register allocation resolves the difference 
between general_operand and the 'r' constraint.  On aarch64 that might not 
matter too much (until it really hits in the middle of a hot loop), but I've 
historically found it better to avoid doing this.

But resolving this would need to be done by the expand code (or possibly the 
pass that does re-association).

Anyway, just my 2p worth.

R.

> 
> Built and tested on aarch64-linux-gnu with no regressions.
> 
>       PR target/112304
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64.md (*csinc2<mode>_insn): Change the
>       predicate of the 1st operand to general_operand.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/aarch64/cinc-2.c: New test.
> 
> Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com>
> ---
>  gcc/config/aarch64/aarch64.md             |  2 +-
>  gcc/testsuite/gcc.target/aarch64/cinc-2.c | 11 +++++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/cinc-2.c
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a6051ebfc5a..046a249475d 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4549,7 +4549,7 @@ (define_insn "aarch64_<crc_variant>"
>  (define_insn "*csinc2<mode>_insn"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>          (plus:GPI (match_operand 2 "aarch64_comparison_operation" "")
> -                  (match_operand:GPI 1 "register_operand" "r")))]
> +                  (match_operand:GPI 1 "general_operand" "r")))]
>    ""
>    "cinc\\t%<w>0, %<w>1, %m2"
>    [(set_attr "type" "csel")]
> diff --git a/gcc/testsuite/gcc.target/aarch64/cinc-2.c 
> b/gcc/testsuite/gcc.target/aarch64/cinc-2.c
> new file mode 100644
> index 00000000000..dc68dfed40f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cinc-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-options "-O2" } */
> +/* PR target/112304 */
> +
> +int f(int a)
> +{
> +        return (a!=0)+42;
> +}
> +
> +/* This function should produce a cinc with a mov instead of a cset with an 
> add. */
> +/* { dg-final { scan-assembler-not "cset\t" } } */
> +/* { dg-final { scan-assembler "cinc\t" } } */

Reply via email to