On Wed, 28 Oct 2015, Kyrill Tkachov wrote: > Hi all, > > This is a follow up to the thread at > https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01646.html > I'm trying to increase the utilisation of conditional compare instructions on > aarch64 and I've identified > tree ifcombine as a place that improves the opportunities for that. > > The transformations in the ifcombine_ifandif function combine basic blocks > with an AND or an IOR when possible, > which is the form that the expander code in ccmp.c looks for when generating > conditional compares. > To recap from the referenced thread, currently ifcombine will only merge such > blocks if the inner block contains > only a single statement i.e. the condition. I'd like to make it more > aggressive. > > This patch allows merging of inner blocks with multiple statements, as long as > those statements are comparisons > or AND/IOR operations. We don't allow any number of such statements though. An > additional gating is introduced > via the PARAM_IFCOMBINE_THRESHOLD param which is used to gate the > estimate_num_insns_seq cost of the inner basic > block, as Richard suggested. Using this param, targets that don't utilise the > ccmp.c machinery can make sure that > the codegen effect from this patch is minimal. The default value for the > param is such that it has minimal > codegen difference on targets other than aarch64 (for which I set it to a > higher value) > > I tried allowing any kind of statements in the inner block, not just > the types mentioned above and using just the param to gate this, but this > didn't show good results on aarch64. > Pulling in the extra code into the unconditional path was not beneficial on > the whole.
I see. Originally I suggested this because you may have if (a > b) if (c + 3 < d) and thus the inner block contains tem = c + 3; if (tem < d) thus allow only stmts that feed the condition. You'd walk the condition stmts SSA uses and mark their defs (if in the BB), recursing on that defs uses (if not already marked). Then in the BB walk check for stmts not marked. But maybe that's too complicated for no visible benefit. I suppose the c + 3 would already break conditional-compare generation. But for generalizing the transform (not ccmp specific) it wouldn't matter whether the extra stmt is a compare or an add (an add may be even cheaper than a compare). Some specifics on the patch: + enum tree_code tcode = gimple_assign_rhs_code (stmt); + if (TREE_CODE_CLASS (tcode) != tcc_comparison + && tcode != TRUTH_AND_EXPR + && tcode != TRUTH_ANDIF_EXPR + && tcode != TRUTH_OR_EXPR + && tcode != TRUTH_ORIF_EXPR THRUTH_* cannot appear in GIMPLE. + && tcode != BIT_AND_EXPR + && tcode != BIT_IOR_EXPR + && tcode != BIT_NOT_EXPR) you miss BIT_XOR_EXPR. For the BIT_* cases you also want to constrain the LHS type to BOOLEAN_TYPE || INTEGRAL_TYPE_P && TYPE_PRECISION () == 1. + int inner_cost + = estimate_num_insns_seq (bb_seq (inner_cond_bb), + &eni_time_weights); please use optimize_bb_for_speed_p (inner_cond_bb) ? &eni_time_weights : &eni_size_weights. > With this approach we can avoid explicitly checking for target conditional > compare support in tree-ssa-ifcombine.c. > Also, with this patch I don't see the gcc.dg/pr66299-2.c regression that I > complained about in the > earlier thread. > > On aarch64 with this patch I see about 4.5% more ccmp instructions being > generated on SPEC2006. > SPECINT improves by about 0.4% on a Cortex-A57 platform and SPECFP shows no > regressions. > > On x86_64 there was no codegen difference for SPEC2006. > > What do people think of this approach? > > A concern is that the new param is essentially a new magic number replacing an > implicit magic number of '1'. > It feels that how aggressively we do this for ccmp generation ultimately > should depend on BRANCH_COST somehow, > but BRANCH_COST is already overloaded and meaningless as discussed in other > threads Ideally we'd have a way to query the target about the cost of the branch we remove in favor of straight-line code. In the ccmp case I can imagine this would be quite "interesting" to implement... (we can't really make sure we'll end up with a ccmp in the end). So pragmatically (and in accordance with the Cauldron discussion) we'd add a gimple_cost target hook which we could feed the GIMPLE_COND we remove and compare against the cost of a gimple stmt computing the value of the comparison plus a bitwise and (plus the corresponding transform of the inner condition to one testing a boolean value we had to push to a register). Richard. > Thanks, > Kyrill > > 2015-10-28 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * params.def (PARAM_IFCOMBINE_THRESHOLD): New param. > * tree-ssa-ifcombine.c: Include tree-inline.h and params.h. > (gimple_seq_contains_logical_ops_p): New function. > (ifcombine_ifandif): Use it. > * config/aarch64/aarch64.c: Include params.h > (aarch64_override_options): Set PARAM_IFCOMBINE_THRESHOLD. > > 2015-10-28 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/ifcombine_cond_cmp_1.c: New test. > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)