I was just looking back through some old emails and saw that this was never reviewed.
On 26/04/2024 18:37, Andrew Pinski wrote: > On many cores, the mov instruction is "free" so the sequence: > cmp w0, #0 > cset w0, ne > add w0, w0, 42 > is more expensive than just: > cmp w0, #0 > mov w1, #42 > cinc w0, w1, ne > > The reason why we get the add case is that the pattern csinc2<mode>_insn > only accepts registers for the predicate and we so we don't get an cinc > without that. The small change to the predicate of using general_operand > instead allows the combine to match and then the mov is generated with > the register allocator checks the constraints. The main advantage of the new sequence is that the latency is reduced to two instructions on dual-issue or above machines: the cmp and the mov can execute in parallel. But there's a downside to your solution: it invisibly adds some register pressure which isn't exposed until register allocation resolves the difference between general_operand and the 'r' constraint. On aarch64 that might not matter too much (until it really hits in the middle of a hot loop), but I've historically found it better to avoid doing this. But resolving this would need to be done by the expand code (or possibly the pass that does re-association). Anyway, just my 2p worth. R. > > Built and tested on aarch64-linux-gnu with no regressions. > > PR target/112304 > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (*csinc2<mode>_insn): Change the > predicate of the 1st operand to general_operand. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/cinc-2.c: New test. > > Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com> > --- > gcc/config/aarch64/aarch64.md | 2 +- > gcc/testsuite/gcc.target/aarch64/cinc-2.c | 11 +++++++++++ > 2 files changed, 12 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/cinc-2.c > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index a6051ebfc5a..046a249475d 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4549,7 +4549,7 @@ (define_insn "aarch64_<crc_variant>" > (define_insn "*csinc2<mode>_insn" > [(set (match_operand:GPI 0 "register_operand" "=r") > (plus:GPI (match_operand 2 "aarch64_comparison_operation" "") > - (match_operand:GPI 1 "register_operand" "r")))] > + (match_operand:GPI 1 "general_operand" "r")))] > "" > "cinc\\t%<w>0, %<w>1, %m2" > [(set_attr "type" "csel")] > diff --git a/gcc/testsuite/gcc.target/aarch64/cinc-2.c > b/gcc/testsuite/gcc.target/aarch64/cinc-2.c > new file mode 100644 > index 00000000000..dc68dfed40f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/cinc-2.c > @@ -0,0 +1,11 @@ > +/* { dg-options "-O2" } */ > +/* PR target/112304 */ > + > +int f(int a) > +{ > + return (a!=0)+42; > +} > + > +/* This function should produce a cinc with a mov instead of a cset with an > add. */ > +/* { dg-final { scan-assembler-not "cset\t" } } */ > +/* { dg-final { scan-assembler "cinc\t" } } */