On Wed, Sep 18, 2013 at 11:45 AM, Zhenqiang Chen <zhenqiang.c...@arm.com> wrote: > >> -----Original Message----- >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> Sent: Tuesday, August 27, 2013 8:18 PM >> To: Richard Earnshaw >> Cc: Zhenqiang Chen; GCC Patches >> Subject: Re: [PATCH 1/n] Add conditional compare support >> >> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw <rearn...@arm.com> >> wrote: >> > On 27/08/13 12:10, Richard Biener wrote: >> >> What's this for and what's the desired semantics? I don't like having >> >> extra tree codes for this. Is this for a specific instruction set >> >> feature? >> > >> > The background is to support the conditional compare instructions in >> > ARM (more effectively) and AArch64 at all. >> > >> > The current method used in ARM is to expand into a series of >> > store-flag instructions and then hope that combine can optimize them >> > away (though that fails far too often, particularly when the first >> > instruction in the sequence is combined into another pattern). To >> > make it work at all the compiler has to lie about the costs of various >> > store-flag type operations which overall risks producing worse code >> > and means we also have to support many more complex multi-instruction >> > patterns than is desirable. I really don't want to go down the same > route >> for AArch64. >> > >> > The idea behind all this is to capture potential conditional compare >> > operations early enough in the mid end that we can keep track of them >> > until RTL expand time and then to emit the correct logic on all >> > targets depending on what is the best thing for that target. The >> > current method of lowering into store-flag sequences doesn't really cut > it. >> >> It seems to me that then the initial instruction selection process (aka > RTL >> expansion) needs to be improved. As we are expanding with having the CFG >> around it should be easy enough to detect AND/ORIF cases and do better >> here. Yeah, I suppose this asks to turn existing jump expansion > optimizations >> up-side-down to optimize with the GIMPLE CFG in mind. >> >> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus - >> fold-const.c is way too early to decide this. Similar to the ongoing work > of >> expanding / building-up switch expressions in a GIMPLE pass, moving expand >> complexity up the pipeline this asks for a GIMPLE phase that moves this >> decision down closer to RTL expansion. >> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform pass) >> > > The patch is updated according to your comments. It is a basic support, > which does not touch ifcombine and jump related optimizations yet. > > Current method is: > 1) In fold-const, if HAVE_conditional_compare, set > LOGICAL_OP_NON_SHORT_CIRCUIT > to optimize_function_for_speed_p. So we do not depend on BRANCH_COST. > 2) Identify CCMP during expanding. A CCMP candidate is a BIT_AND_EXPR > or BIT_IOR_EXPR, whose operators are compares. > 3) Add a new op in optab to expand the CCMP to optimized RTL, > e.g. and_scc_scc/ior_scc_scc in ARM. > > Bootstrap on ARM Chrome book. > No make check regression on Pandaboard.
That's better. A few observations +static bool +is_conditional_compare_candidate_p (gimple g) +{ + tree rhs = gimple_assign_rhs_to_tree (g); + tree lhs, op0, op1; + gimple gs0, gs1; + + if (TREE_CODE (rhs) != BIT_AND_EXPR && TREE_CODE (rhs) != BIT_IOR_EXPR) + return false; + + lhs = gimple_assign_lhs (g); + op0 = TREE_OPERAND (rhs, 0); + op1 = TREE_OPERAND (rhs, 1); + + if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME) + || !has_single_use (lhs)) + return false; + + gs0 = SSA_NAME_DEF_STMT (op0); + gs1 = SSA_NAME_DEF_STMT (op1); you are not allowed to use arbitrary definition stmts for code generation (you may have a look at them though). Use the get_gimple_for_ssa_name helper to only access definition stmts that have no code generated yet and that you can modify. + gimple gs0 = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 0)); + gimple gs1 = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 1)); See above. + if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0))) + && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)))) + { + rtx op0, op1, op2, op3, tmp; + tree cmp0_op0, cmp0_op1, cmp1_op0, cmp1_op1; + enum machine_mode mode; + + cmp0_op0 = gimple_assign_rhs1 (gs0); + cmp0_op1 = gimple_assign_rhs2 (gs0); + cmp1_op0 = gimple_assign_rhs1 (gs1); + cmp1_op1 = gimple_assign_rhs2 (gs1); + + expand_operands (cmp0_op0, cmp0_op1, NULL_RTX, &op0, &op1, EXPAND_NORMAL); + expand_operands (cmp1_op0, cmp1_op1, NULL_RTX, &op2, &op3, EXPAND_NORMAL); as you are not expanding the result of gs0 or gs1 you indeed have to use get_gimple_for_ssa_name. + else + r = expand_expr_real (rhs, target, tmode, modifier, NULL); +#else + r = expand_expr_real (rhs, target, tmode, modifier, NULL); +#endif just place the #else after the else, no need to duplicate the expand_expr_real call. +#ifdef HAVE_conditional_compare +#undef LOGICAL_OP_NON_SHORT_CIRCUIT +#define LOGICAL_OP_NON_SHORT_CIRCUIT optimize_function_for_speed_p (cfun) +#else #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT #define LOGICAL_OP_NON_SHORT_CIRCUIT \ (BRANCH_COST (optimize_function_for_speed_p (cfun), \ false) >= 2) #endif +#endif that's of course a hack ;) If we want to keep LOGICAL_OP_NON_SHORT_CIRCUIT then it should become a target hook at this point. But I'd argue that at the level of fold-const.c it should simply always be false. @@ -187,6 +187,7 @@ OPTAB_D (movcc_optab, "mov$acc") OPTAB_D (cmov_optab, "cmov$a6") OPTAB_D (cstore_optab, "cstore$a4") OPTAB_D (ctrap_optab, "ctrap$a4") +OPTAB_D (ccmp_optab, "conditional_compare") I'd say the two should match, so ccmp_obtab, "ccmp" You miss updates to md.texi for the new named patterns. Richard. > Thanks! > -Zhenqiang > > ChangeLog: > 2013-09-18 Zhenqiang Chen <zhenqiang.c...@linaro.org> > > * config/arm/arm.md (conditional_compare): New. > * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr): > New. > (expand_expr_real_1): Identify conditional compare. > * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update. > * optabs.c (expand_ccmp_op): New. > (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. > * optabs.def (ccmp_optab): New. > * optabs.h (expand_ccmp_op): New.