On 08/10/2017 03:14 PM, Michael Collison wrote:
> Hi all,
> 
> One issue that we keep encountering on aarch64 is GCC not making good use of 
> the flag-setting arithmetic instructions
> like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare 
> the result against zero.
> They are represented in a fairly standard way in the backend as PARALLEL 
> patterns:
> (parallel [(set (reg x1) (op (reg x2) (reg x3)))
>            (set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])
> 
> GCC isn't forming these from separate arithmetic and comparison instructions 
> as aggressively as it could.
> A particular pain point is when the result of the arithmetic insn is used 
> before the comparison instruction.
> The testcase in this patch is one such example where we have:
> (insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>         (plus:SI (reg:SI 0 x0 [ x ])
>             (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
>      (nil))
> (insn 33 7 34 2 (set (reg:SI 1 x1 [77])
>         (plus:SI (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>             (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
>      (nil))
> (insn 34 33 17 2 (set (reg:CC 66 cc)
>         (compare:CC (reg/v:SI 0 x0 [orig:73 <retval> ] [73])
>             (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
>      (nil))
> 
> This scares combine away as x0 is used in insn 33 as well as the comparison 
> in insn 34.
> I think the compare-elim pass can help us here.
Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.

> 
> This patch extends it by handling comparisons against zero, finding the 
> defining instruction of the compare
> and merging the comparison with the defining instruction into a PARALLEL that 
> will hopefully match the form
> described above. If between the comparison and the defining insn we find an 
> instruction that uses the condition
> registers or any control flow we bail out, and we don't cross basic blocks.
> This simple technique allows us to catch cases such as this example.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu 
> and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>           Michael Collison <michael.colli...@arm.com>
> 
>       * compare-elim.c: Include emit-rtl.h.
>       (can_merge_compare_into_arith): New function.
>       (try_validate_parallel): Likewise.
>       (try_merge_compare): Likewise.
>       (try_eliminate_compare): Call the above when no previous clobber
>       is available.
>       (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
>       dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>           Michael Collison <michael.colli...@arm.com>
>           
>       * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
> 
> 
> pr5198v1.patch
> 
> 
> diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
> index 7e557a2..c65d155 100644
> --- a/gcc/compare-elim.c
> +++ b/gcc/compare-elim.c
> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm_p.h"
>  #include "insn-config.h"
>  #include "recog.h"
> +#include "emit-rtl.h"
>  #include "cfgrtl.h"
>  #include "tree-pass.h"
>  #include "domwalk.h"
> @@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, 
> rtx_insn *start)
>    return reg;
>  }
>  
> +/* Return true if it is okay to merge the comparison CMP_INSN with
> +   the instruction ARITH_INSN.  Both instructions are assumed to be in the
> +   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
> +   that there are no uses or defs of the condition flags or control flow
> +   changes between the two instructions.  */
> +
> +static bool
> +can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
> +{
> +  for (rtx_insn *insn = PREV_INSN (cmp_insn);
> +       insn && insn != arith_insn;
> +       insn = PREV_INSN (insn))
> +    {
> +      if (!NONDEBUG_INSN_P (insn))
> +     continue;
> +      /* Bail if there are jumps or calls in between.  */
> +      if (!NONJUMP_INSN_P (insn))
> +     return false;
> +
> +      df_ref ref;
> +      /* Find a USE of the flags register.  */
> +      FOR_EACH_INSN_USE (ref, insn)
> +     if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +       return false;
> +
> +      /* Find a DEF of the flags register.  */
> +      FOR_EACH_INSN_DEF (ref, insn)
> +     if (DF_REF_REGNO (ref) == targetm.flags_regnum)
> +       return false;
> +    }
> +  return true;
> +}
What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.

Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?


> +
> +/* Given two SET expressions, SET_A and SET_B determine whether they form
> +   a recognizable pattern when emitted in parallel.  Return that parallel
> +   if so.  Otherwise return NULL.  This tries both permutations of SET_A
> +   and SET_B within the PARALLEL.  */
> +
> +static rtx
> +try_validate_parallel (rtx set_a, rtx set_b)
> +{
> +  rtx par
> +    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_a, set_b));
> +  rtx_insn *seq;
> +  start_sequence ();
> +  seq = emit_insn (par);
> +  end_sequence ();
> +  if (recog_memoized (seq) > 0)
> +    return par;
> +
> +  par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set_b, set_a));
> +  start_sequence ();
> +  seq = emit_insn (par);
> +  end_sequence ();
> +  return recog_memoized (seq) > 0 ? par : NULL_RTX;
> +}
I don't think you need to build up a sequence here.   Can't you just
allocate the INSN container and set its PATTERN?

> +
> +/* For a comparison instruction described by CMP check if it compares a
> +   register with zero i.e. it is of the form CC := CMP R1, 0.
> +   If it is, find the instruction defining R1 (say I1) and try to create a
> +   PARALLEL consisting of I1 and the comparison, representing a flag-setting
> +   arithmetic instruction.  Example:
> +   I1: R1 := R2 + R3
> +   <instructions that don't read the condition register>
> +   I2: CC := CMP R1 0
> +   I2 can be merged with I1 into:
> +   I1: { R1 := R2 + R3 ; CC := CMP (R2 + R3) 0 }
> +   This catches cases where R1 is used between I1 and I2 and therefore
> +   combine and other RTL optimisations will not try to propagate it into
> +   I2.  Return true if we succeeded in merging CMP.  */
> +
> +static bool
> +try_merge_compare (struct comparison *cmp)
> +{
> +  rtx_insn *cmp_insn = cmp->insn;
> +
> +  if (!REG_P (cmp->in_a) || cmp->in_b != const0_rtx)
> +    return false;
> +  rtx in_a = cmp->in_a;
> +  if (!in_a)
> +    return false;
Isn't the second IF redundant?  How can !in_a ever be true here given we
tested REG_P (cmp->in_a)?


> +  df_ref use;
> +
> +  FOR_EACH_INSN_USE (use, cmp_insn)
> +    if (DF_REF_REGNO (use) == REGNO (in_a))
> +      break;
> +  if (!use)
> +    return false;
> +
> +  struct df_link *ref_chain;
> +  ref_chain = DF_REF_CHAIN (use);
> +  if (!ref_chain || !ref_chain->ref
> +      || !DF_REF_INSN_INFO (ref_chain->ref) || ref_chain->next != NULL)
> +    return false;
So what are you checking for here?  I've got a pretty good guess, but
let's just make it explicit in a comment.


Generally this looks pretty close.  THe biggest concern I see is I think
you need to verify that the inputs on the arthmetic insn don't change
between the arithmetic and the compare.

jeff

Reply via email to