On 08/10/2017 03:14 PM, Michael Collison wrote: > Hi all, > > One issue that we keep encountering on aarch64 is GCC not making good use of > the flag-setting arithmetic instructions > like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare > the result against zero. > They are represented in a fairly standard way in the backend as PARALLEL > patterns: > (parallel [(set (reg x1) (op (reg x2) (reg x3))) > (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))]) > > GCC isn't forming these from separate arithmetic and comparison instructions > as aggressively as it could. > A particular pain point is when the result of the arithmetic insn is used > before the comparison instruction. > The testcase in this patch is one such example where we have: > (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73]) > (plus:SI (reg:SI 0 x0 [ x ]) > (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64} > (nil)) > (insn 33 7 34 2 (set (reg:SI 1 x1 [77]) > (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73]) > (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64} > (nil)) > (insn 34 33 17 2 (set (reg:CC 66 cc) > (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73]) > (const_int 0 [0]))) "comb.c":4 391 {cmpsi} > (nil)) > > This scares combine away as x0 is used in insn 33 as well as the comparison > in insn 34. > I think the compare-elim pass can help us here. Is it the multiple use or the hard register that combine doesn't appreciate. The latter would definitely steer us towards compare-elim.
> > This patch extends it by handling comparisons against zero, finding the > defining instruction of the compare > and merging the comparison with the defining instruction into a PARALLEL that > will hopefully match the form > described above. If between the comparison and the defining insn we find an > instruction that uses the condition > registers or any control flow we bail out, and we don't cross basic blocks. > This simple technique allows us to catch cases such as this example. > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu > and x86_64. > > Ok for trunk? > > 2017-08-05 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Michael Collison <michael.colli...@arm.com> > > * compare-elim.c: Include emit-rtl.h. > (can_merge_compare_into_arith): New function. > (try_validate_parallel): Likewise. > (try_merge_compare): Likewise. > (try_eliminate_compare): Call the above when no previous clobber > is available. > (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN > dataflow problems. > > 2017-08-05 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Michael Collison <michael.colli...@arm.com> > > * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test. > > > pr5198v1.patch > > > diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c > index 7e557a2..c65d155 100644 > --- a/gcc/compare-elim.c > +++ b/gcc/compare-elim.c > @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see > #include "tm_p.h" > #include "insn-config.h" > #include "recog.h" > +#include "emit-rtl.h" > #include "cfgrtl.h" > #include "tree-pass.h" > #include "domwalk.h" > @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, > rtx_insn *start) > return reg; > } > > +/* Return true if it is okay to merge the comparison CMP_INSN with > + the instruction ARITH_INSN. Both instructions are assumed to be in the > + same basic block with ARITH_INSN appearing before CMP_INSN. This checks > + that there are no uses or defs of the condition flags or control flow > + changes between the two instructions. */ > + > +static bool > +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn) > +{ > + for (rtx_insn *insn = PREV_INSN (cmp_insn); > + insn && insn != arith_insn; > + insn = PREV_INSN (insn)) > + { > + if (!NONDEBUG_INSN_P (insn)) > + continue; > + /* Bail if there are jumps or calls in between. */ > + if (!NONJUMP_INSN_P (insn)) > + return false; > + > + df_ref ref; > + /* Find a USE of the flags register. */ > + FOR_EACH_INSN_USE (ref, insn) > + if (DF_REF_REGNO (ref) == targetm.flags_regnum) > + return false; > + > + /* Find a DEF of the flags register. */ > + FOR_EACH_INSN_DEF (ref, insn) > + if (DF_REF_REGNO (ref) == targetm.flags_regnum) > + return false; > + } > + return true; > +} What about old style asms? Do you need to explicit punt those? I don't think they carry DF info. Don't you also have to verify that the inputs to ARITH_INSN are unchanged between ARITH_INSN and CMP_INSN? > + > +/* Given two SET expressions, SET_A and SET_B determine whether they form > + a recognizable pattern when emitted in parallel. Return that parallel > + if so. Otherwise return NULL. This tries both permutations of SET_A > + and SET_B within the PARALLEL. */ > + > +static rtx > +try_validate_parallel (rtx set_a, rtx set_b) > +{ > + rtx par > + = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b)); > + rtx_insn *seq; > + start_sequence (); > + seq = emit_insn (par); > + end_sequence (); > + if (recog_memoized (seq) > 0) > + return par; > + > + par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_b, set_a)); > + start_sequence (); > + seq = emit_insn (par); > + end_sequence (); > + return recog_memoized (seq) > 0 ? par : NULL_RTX; > +} I don't think you need to build up a sequence here. Can't you just allocate the INSN container and set its PATTERN? > + > +/* For a comparison instruction described by CMP check if it compares a > + register with zero i.e. it is of the form CC := CMP R1, 0. > + If it is, find the instruction defining R1 (say I1) and try to create a > + PARALLEL consisting of I1 and the comparison, representing a flag-setting > + arithmetic instruction. Example: > + I1: R1 := R2 + R3 > + <instructions that don't read the condition register> > + I2: CC := CMP R1 0 > + I2 can be merged with I1 into: > + I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 } > + This catches cases where R1 is used between I1 and I2 and therefore > + combine and other RTL optimisations will not try to propagate it into > + I2. Return true if we succeeded in merging CMP. */ > + > +static bool > +try_merge_compare (struct comparison *cmp) > +{ > + rtx_insn *cmp_insn = cmp->insn; > + > + if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx) > + return false; > + rtx in_a = cmp->in_a; > + if (!in_a) > + return false; Isn't the second IF redundant? How can !in_a ever be true here given we tested REG_P (cmp->in_a)? > + df_ref use; > + > + FOR_EACH_INSN_USE (use, cmp_insn) > + if (DF_REF_REGNO (use) == REGNO (in_a)) > + break; > + if (!use) > + return false; > + > + struct df_link *ref_chain; > + ref_chain = DF_REF_CHAIN (use); > + if (!ref_chain || !ref_chain->ref > + || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL) > + return false; So what are you checking for here? I've got a pretty good guess, but let's just make it explicit in a comment. Generally this looks pretty close. THe biggest concern I see is I think you need to verify that the inputs on the arthmetic insn don't change between the arithmetic and the compare. jeff