Re: [PATCH][RFC][tree-ifcombine] Making tree-ifcombine more friendly for conditional comparisons generation

Richard Biener Wed, 28 Oct 2015 06:33:42 -0700

On Wed, 28 Oct 2015, Kyrill Tkachov wrote:

> Hi all,
> 
> This is a follow up to the thread at
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01646.html
> I'm trying to increase the utilisation of conditional compare instructions on
> aarch64 and I've identified
> tree ifcombine as a place that improves the opportunities for that.
> 
> The transformations in the ifcombine_ifandif function combine basic blocks
> with an AND or an IOR when possible,
> which is the form that the expander code in ccmp.c looks for when generating
> conditional compares.
> To recap from the referenced thread, currently ifcombine will only merge such
> blocks if the inner block contains
> only a single statement i.e. the condition. I'd like to make it more
> aggressive.
> 
> This patch allows merging of inner blocks with multiple statements, as long as
> those statements are comparisons
> or AND/IOR operations. We don't allow any number of such statements though. An
> additional gating is introduced
> via the PARAM_IFCOMBINE_THRESHOLD param which is used to gate the
> estimate_num_insns_seq cost of the inner basic
> block, as Richard suggested. Using this param, targets that don't utilise the
> ccmp.c machinery can make sure that
> the codegen effect from this patch is minimal.  The default value for the
> param is such that it has minimal
> codegen difference on targets other than aarch64 (for which I set it to a
> higher value)
> 
> I tried allowing any kind of statements in the inner block, not just
> the types mentioned above and using just the param to gate this, but this
> didn't show good results on aarch64.
> Pulling in the extra code into the unconditional path was not beneficial on
> the whole.


I see.  Originally I suggested this because you may have

  if (a > b)
    if (c + 3 < d)

and thus the inner block contains

   tem = c + 3;
   if (tem < d)

thus allow only stmts that feed the condition.  You'd walk the
condition stmts SSA uses and mark their defs (if in the BB),
recursing on that defs uses (if not already marked).  Then in the
BB walk check for stmts not marked.

But maybe that's too complicated for no visible benefit.  I suppose
the c + 3 would already break conditional-compare generation.  But
for generalizing the transform (not ccmp specific) it wouldn't matter
whether the extra stmt is a compare or an add (an add may be even cheaper
than a compare).

Some specifics on the patch:

+      enum tree_code tcode = gimple_assign_rhs_code (stmt);
+      if (TREE_CODE_CLASS (tcode) != tcc_comparison
+         && tcode != TRUTH_AND_EXPR
+         && tcode != TRUTH_ANDIF_EXPR
+         && tcode != TRUTH_OR_EXPR
+         && tcode != TRUTH_ORIF_EXPR

THRUTH_* cannot appear in GIMPLE.

+         && tcode != BIT_AND_EXPR
+         && tcode != BIT_IOR_EXPR
+         && tcode != BIT_NOT_EXPR)

you miss BIT_XOR_EXPR.  For the BIT_* cases you also want to
constrain the LHS type to BOOLEAN_TYPE || INTEGRAL_TYPE_P && 
TYPE_PRECISION () == 1.

+             int inner_cost
+               = estimate_num_insns_seq (bb_seq (inner_cond_bb),
+                                              &eni_time_weights);

please use optimize_bb_for_speed_p (inner_cond_bb) ? &eni_time_weights : 
&eni_size_weights.

> With this approach we can avoid explicitly checking for target conditional
> compare support in tree-ssa-ifcombine.c.
> Also, with this patch I don't see the gcc.dg/pr66299-2.c regression that I
> complained about in the
> earlier thread.
> 
> On aarch64 with this patch I see about 4.5% more ccmp instructions being
> generated on SPEC2006.
> SPECINT improves by about 0.4% on a Cortex-A57 platform and SPECFP shows no
> regressions.
> 
> On x86_64 there was no codegen difference for SPEC2006.
> 
> What do people think of this approach?
> 
> A concern is that the new param is essentially a new magic number replacing an
> implicit magic number of '1'.
> It feels that how aggressively we do this for ccmp generation ultimately
> should depend on BRANCH_COST somehow,
> but BRANCH_COST is already overloaded and meaningless as discussed in other
> threads

Ideally we'd have a way to query the target about the cost of the
branch we remove in favor of straight-line code.  In the ccmp case
I can imagine this would be quite "interesting" to implement...
(we can't really make sure we'll end up with a ccmp in the end).

So pragmatically (and in accordance with the Cauldron discussion)
we'd add a gimple_cost target hook which we could feed the
GIMPLE_COND we remove and compare against the cost of a gimple
stmt computing the value of the comparison plus a bitwise and
(plus the corresponding transform of the inner condition to
one testing a boolean value we had to push to a register).

Richard.

> Thanks,
> Kyrill
> 
> 2015-10-28  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> 
>     * params.def (PARAM_IFCOMBINE_THRESHOLD): New param.
>     * tree-ssa-ifcombine.c: Include tree-inline.h and params.h.
>     (gimple_seq_contains_logical_ops_p): New function.
>     (ifcombine_ifandif): Use it.
>     * config/aarch64/aarch64.c: Include params.h
>     (aarch64_override_options): Set PARAM_IFCOMBINE_THRESHOLD.
> 
> 2015-10-28  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> 
>     * gcc.target/aarch64/ifcombine_cond_cmp_1.c: New test.
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH][RFC][tree-ifcombine] Making tree-ifcombine more friendly for conditional comparisons generation

Reply via email to