Re: [PATCH] fold fold_truth_andor field merging into ifcombine

Alexandre Oliva Sun, 20 Oct 2024 19:42:33 -0700

On Oct 10, 2024, Richard Biener <richard.guent...@gmail.com> wrote:

> Thanks for working on this.  There's #if 0 portions in the patch - did you
> send the correct version?


'fraid so.  Sorry, I'd forgotten about that bit.  Long story (not so)
short, this patch is a bit of a frankenstein monster: there's a large
portion from the original fold_truth_andor patch, that deals with the
bit field merging; there's a largish portion from a couple of years ago,
in which I set out to reconstruct the mergeable expressions in gimple
for ifcombine, but failed because vuses would unavoidably be present,
and the latest recent try in which I went ahead with the bits that
dropped vuses, and managed to introduce in ifcombine what I considered
the most valuable contribution in this patch, namely, the combination of
non-neighbor conditions.  The logic from each phase was not really
touched in subsequent phases, and years had elapsed, so the recollection
of these cpp conditionals had slipped out of my mind.

My swapped-back-in realization is that I took a copy of
ssa_is_replaceable_p and made it ssa_is_substitutable_p, with some
undesired conditions removed by #if 0 along wiht declarations only used
in those bits.  The idea back then was to highlight the differences, so
that, when revisiting it, I coulud perhaps refactor and combine the
functions if that made sense.  I'm not sure it does.

> I wonder if we can check the locations as to whether the spelling location
> is different and suppress diagnostics when macro expansions were involved?

Presumably.  I had tracked down the ppc condition back in the
fold_truth_andoor version of the patch, but the x86 hits are newer, and
the line numbers didn't help me find the errors, so I don't even know
whether it's macros that we're talking about.  These new warnings were
encouraging, for they showed that the ifcombine incarnation of the patch
covered additional combinations that the fold-based one couldn't.  But
they only came up in the latest version, so I know the mutually
exclusive conditions are not neighbors.

But I digress.  The relevant information here is that the line numbers
are off, so I don't even know whether macros are involved.

> Adding -Wno-error should be the last resort and I suspect such cases
> happen in user code as well?

I'm pretty sure this will come up in user code, yeah, but isn't that
something we'd *want* to flag, rather than suppress?  I mean, mutually
exclusive conditions are probably something worth flagging and looking
into even (especially?) if they come from macros.  The fact that we
missed them before doesn't strike me as enough of a reason to silence
them.

>> +/* Return TRUE if expression STMT is suitable for replacement.  ???
>> +   Same as ssa_is_replaceable_p, except that we don't insist it has a
>> +   single use.  */
>> 
>> +static bool
>> +ssa_is_substitutable_p (gimple *stmt)

> I don't yet see how many times you use this - there's other passes
> which need to query a subset of this like DCE or DSE for whether
> a stmt can be elided.

Only twice, in the bits from phase2 that recognize and reconstruct from
gimple condition operand trees that could be merged.  I realize now that
perhaps gimple matching could do this job, so perhaps I can get rid of
this.

> This is mostly gimple_has_side_effects () but there's also
> !cfun->can_delete_dead_exceptions && stmt_could_throw_p (cfun, stmt)

Erhm, I don't think the paragraph above applies to
ssa_is_substitutable_p.

> I think the name is probably bad and suggests it's generally applicable,
> so maybe rename it to something more specific for your use?

ACK, will do if I don't get rid of it.

TBH, I don't think I've given this modified version of
tree-outof-ssa.cc's ssa_is_replaceable_p() much thought, I just knew I
didn't mind combining/merging trees even if they had additional uses.

Should I apply your suggestions for it to the original function?

>> +static tree
>> +is_cast_p (tree *name)

> very short name, add a prefix?

Will do if it survives.

>> +{
>> +  tree type = 0;
>> +
>> +  while (TREE_CODE (*name) == SSA_NAME
>> +        && !SSA_NAME_IS_DEFAULT_DEF (*name))
>> +    {
>> +      gimple *def = SSA_NAME_DEF_STMT (*name);
>> +
>> +      if (!ssa_is_substitutable_p (def))

> Overly restrictive/expensive?  Just check if it's a GIMPLE assign?

This is from phase2, so the details escape me already, but I don't think
that would be enough.  We need to know that the definition is movable in
principle, presumably to the point of use.  If we combine conditions,
we'll end up evaluating it at that point.

>> +      if (gimple_assign_single_p (def))
>> +       {
>> +         if (gimple_assign_load_p (def))
>> +           break;
>> +         *name = gimple_assign_rhs1 (def);

> that could be a constant, or a REALPART_EXPR.  Maybe move after ...

>> +         continue;
>> +       }
>> +
>> +      if (!gimple_assign_cast_p (def))
>> +       break;

> ... this?

I don't get what you're getting at here.

IIRC this function recognizes gimple that, in the fold logic, used to
come up in (bit)field extraction and manipulation for testing,
particularly type widening and narrowing.  It is supposed to preserve
the preexisting logic.

>> +
>> +      if (!type)
>> +       type = TREE_TYPE (*name);
>> +      *name = gimple_assign_rhs1 (def);
>> +    }
>> +
>> +  return type;

> Overall this is a very odd function as it strips even (widen)(truncate)x?

It's not really stripping anything, just identifying the initial type of
the field that underwent type casts.

>> +  /* We are interested in the bare arrangement of bits, so strip everything
>> +     that doesn't affect the machine mode.  However, record the type of the
>> +     outermost expression if it may matter below.  */
>> +  outer_type = is_cast_p (&exp);

> I think is_cast_p strips more than casts not affecting the machine mode.
> It looks like you want a STRIP_NOPS that works on GIMPLE?

I don't think so.  IIRC this is modeled after preexisting code that did
more than STRIP_NOPs, it followed CONVERT_EXPR_P as well.

> I'm not sure if it would scale, but it looks like there's a certain
> set of patterns
> this tries to match - did you think of using a (match ...) pattern in match.pd
> instead?

I didn't.  I recall looking into such pattern matching at some point,
before getting started on ifcombine, but I didn't revisit it.  The
patterns I needed to recognize to preserve the fold functionality were
so varied depending on details that I wasn't sure that would be useful.
But I guess now I should give it a try.

>> +static tree
>> +unextend (tree c, int p, int unsignedp, tree mask)

> I'd like all of the above to be done via wide_int, that has utilities to build
> (shifted) masks and sign- or zero-extend from specific bits.

Ugh :-) I was hoping I wouldn't have to revisit the bit-manipulation
logic taken from the earliest version of the patch, let alone rewrite it
:-)

>> +  tree ref = make_bit_field_ref (loc, unshare_expr (inner),
>> +                                unshare_expr (orig_inner),
>> +                                type, bitsize, bitpos,

> I suppose 'inner' is what the earlier get_inner_reference returned - I don't
> think this built load properly preserves TBAA properties, at least
> a MEM[(char *)&a] would have an inner of 'a'?

These bitfield references combine access to multiple fields of a.  It's
not very much unlike accessing the entirety of 'a'.  What else
could/should it be?

>> +build_split_load (tree /* out */ ln_arg[2],

>> +      tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);

> Use type_for_mode?

Hmm, I'm pretty sure I tried something like that but had to resort to
this back in the first version.

>> +static int
>> +mergeable_loads_p (gimple *lstmt, gimple *rstmt)

>> +  if (stmt_dominates_stmt_p (lstmt, rstmt))

> Can you avoid these please?

I'm not sure.  IIRC we need to know which stmt dominates the other.  But
that's only to know where to insert prep stmts; maybe we can return them
and leave it for the caller to gimplify as needed.

>   if (gimple_vuse (stmts[0]) != gimple_vuse (stmts[1]))
>     return 0;

*doh*, yeah, right, thanks!

> The function is large - maybe some factoring is possible?  I also notice
> you return a GENERIC expression here where having a GIMPLE value
> and an alternate gimple_seq output for defining stmts would be better?

It would have to be two seqs, to handle split cases.
I'll give that a try.

>> +static bool
>> +constant_condition_p (gcond *cond)
>> +{
>> +  if (!cond)
>> +    return true;

> It's a bit odd this returns true for no condition. Maybe
> static_outgoing_edge_p ()?

That name would work.  I'll think of other possibilities.  I didn't
expect that covering the degenerate case would be a problem.

> (what about a throwing stmt?)

That would prevent ifcombine from considering the block.

> IMO inlining the predicate to it's few(?) uses
> avoids giving it a "proper" name ;)

I'll consider that.  Are you suggesting inlining the second
constant_condition_p (that takes a bb), the one quoted above, or both?

>> +/* Same as recognize_if_then_else, but check that the condition is not
>> +   constant.  It is not useful to combine constant conditions.  */

> I do wonder why we ever get to the situation with constant conditions,
> thus why recognize_if_then_else should match them in the first place?

Two situations:

- when there's a straight-line block (that could be cleaned up)
  separating blocks with combinable conditions

- when we're crossing a block whose condition has already been combined,
  looking for other conditions to combine in earlier blocks

>> +static bool
>> +recognize_if_succs (basic_block cond_bb,
>> +                   basic_block succ1, basic_block succ2)

> Can you use

>   basic_block tem_then = NULL, tem_else = NULL;
>   recognize_if_then_else (cond_bb, &tem_then, &tem_else);

> or even

>    (recognize_if_then_else (cond_bb, succ1, succ2)
>     || recognize_if_then_else (cond_bb, succ2, succ1))

> at callers instead of adding another function?

I could, I recall that it worked that way before I optimized it to avoid
various tests that we don't need but the other function performs, and
even the indirection and optional filling-in that we don't need.


>> @@ -129,7 +183,7 @@ bb_no_side_effects_p (basic_block bb)
>> enum tree_code rhs_code;
>> if (gimple_has_side_effects (stmt)
>> || gimple_could_trap_p (stmt)
>> -         || gimple_vuse (stmt)
>> +         /* || gimple_vuse (stmt) */

> I think you want to preserve || gimple_vdef (stmt) at least - this possibly
> saves you from checking for intermediate stores.

That's covered by gimple_has_side_effects, I hope.
(checks)
Hmm, apparently not.
Comments in there, and even its name, are now very misleading.

Yeah, looks like we need to rule out gimple_vdef.

> As you special-case hard register uses, do we want to ever make hard-register
> uses or defs unconditional?  I'll note that on GIMPLE hard register uses/defs
> are treated as memory, so they have virtual defs and uses.

Erhm, do I?  I don't recall that, and I can't tell what you're getting
at.

>> +/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2.
>> +   COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND
>> +   replaced with a constant, but if there are intervening blocks, it's best 
>> to
>> +   adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND.  */

> What kind of "intervening" blocks could there be?

(a.f1 == b.f1 // bb A
 && x != y // bb B
 && a.f2 == b.f2 // bb C
... )

We may be able to combine C into A, without penalty, as long as B has no
side effects that prevent the combination.

>> +static tree
>> +ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
>> +                       gcond *outer_cond, bool outer_inv,
>> +                       tree cond, bool must_canon,
>> +                       tree cond2)
>> +{
[...]
>> +  if (tree tcanon = canonicalize_cond_expr_cond (cond))
>> +    cond = tcanon;
>> +  else if (must_canon)
>> +    return NULL_TREE;

> ?  I think you should use gimple_build or gimplify what was built to
> a is_gimple_condexpr_for_cond like ifcombine does elsehwere.

That bit is refactored from preexisting code.  Some paths required
canonicalization to succeed.  I only preserved that.

>> -  /* Path outer_cond_bb->(outer2) needs to be merged into path
>> -     outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
>> -     and probability of inner_not_taken updated.  */
>> +       /* Mark SSA DEFs that are referenced by cond and may thus need to be
>> +          moved to outer.  */
>> +       {
>> +         ifcombine_mark_ssa_name_t data = { used, outer_bb };
>> +         walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
>> +       }

> I think this moving code should be an enhancement ontop of the basic
> functionality - that is, I'd insert at inner_cond.

Combining A into C (from my example above), as you suggest here, could
penalize a carefully written sequence of conditions, so I'd rather not
do that.  It wouldn't even be an optimization.

> The intervening blocks must
> necessarily form straight-line code and thus we're doing "scheduling" here
> which is premature on GIMPLE?

Maybe it wasn't clear from the description that the intervening blocks
could have multiple other unrelated conditions?

>> +      /* Leave CFG optimization to cfg_cleanup.  */
>> +      gimple_cond_set_condition_from_tree (outer_cond,
>> +                                          outer_inv
>> +                                          ? boolean_false_node
>> +                                          : boolean_true_node);

> There is gimple_cond_make_{true,false}

Yeah.  This code was factored from ifcombine_ifandif.
I don't like making unrelated changes during refactoring.

I understand I'm to separate the refactoring, so I might as well make
this cleanup (?) another separate patch.

>> +      if (!ifcombine_replace_cond (inner_cond, inner_inv,
>> +                                  outer_cond, outer_inv,
>> +                                  t, false, NULL_TREE))

> Can you split out factoring this function?

Heh.  Sure.  Funny.  I had it separate before I folded and consolidated
the patch to submit it.

>> -      if (!is_gimple_condexpr_for_cond (t))
>> -       {
>> -         gsi = gsi_for_stmt (inner_cond);
>> -         t = force_gimple_operand_gsi_1 (&gsi, t, 
>> is_gimple_condexpr_for_cond,
>> -                                         NULL, true, GSI_SAME_STMT);
>> -       }
>> -      gimple_cond_set_condition_from_tree (inner_cond, t);
>> -      update_stmt (inner_cond);
>> -
>> -      /* Leave CFG optimization to cfg_cleanup.  */
>> -      gimple_cond_set_condition_from_tree (outer_cond,
>> -       outer_inv ? boolean_false_node : boolean_true_node);
>> -      update_stmt (outer_cond);
>> -      update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);

> And moving this?

*nod*, it's the same (re)factoring.

>> This requires a single predecessor of the inner cond_bb.  */

> I'm a bit lost as to what the following now matches - it's certainly
> not only the
> special-case in the above comment.  Having the loop also appears to make
> it somewhat "unbound"?

We're walking up the chain of single-preds looking for an outer block
whose condition can be combined with the inner.

The comment still holds, but I guess pointing out that intervening
blocks may exist and be skipped.

> Isn't this also an improvement that's independent on the bitfield compare
> simplification?

Yes.

> If so can you split it out to make the overall patch smaller?

Heh.  Sure.  Funny again ;-)

The code for moving defs up and combining into the earlier condition
would probably also go into this separate patch, because without
intervening blocks, using the inner block is always ok, except when we
get a split condition, i.e., when the second condition accesses a field
that straddles over words, and the first condition accesses one of those
words.

Handling that correctly requires the code to move bits up, but it
complicates the patch splitting.  Would you prefer a larger patch
sequence, breaking up the pieces so that every patch generates ideal
code for the features it supports, or would an intermediate patch in
which we combine the split conditions at the inner block in a way that
loses the short-circuiting from andif/orif, but that a later patch in
the sequence restores, be ok?


>> if (safe_is_a <gcond *> (*gsi_last_bb (bb)))
>> -       if (tree_ssa_ifcombine_bb (bb))
>> +       if (basic_block outer_bb = tree_ssa_ifcombine_bb (bb))

> would it help if we'd perform this walk in a more defined manner?  In 
> particular
> after your more complex CFG matching is single-pred-before-succ order still
> appropriate?

Perhaps.  There's an apparent potential for quadratic behavior here, but
I'm not sure it is actually there.  If it is, a different order might or
might not make things better.

I'm not entirely sure that the combination into outer requires
reconsidering outer.  ISTM that, if we could have combined outer, we
would have already done that, and then the combination of inner would
involve the same third block.

I am sure, however, that reconsidering inner is necessary for some
optimizations, such as cases in which we split its condition, and then
we could combine part of the condition into one block, and another part
into another.

>> +           /* Go back to outer_bb, in case it could be further optimized, 
>> but
>> +              only at -O2+, since it could get quadratic.  */

> You're also walking back(?) in tree_ssa_ifcombine_bb now.  I also think
> single-pred-before-succ order is supposed to catch all 2nd level 
> opportunities?

No, we need at the very least to retry the inner block after a
successful combination, because of split conditions, and because in most
cases we bring the outer condition into the inner one.

The uncertainty to me is whether there's any case in which a combination
into outer makes room for outer to be further combined as inner.  I
suspect some split condition cases might, and I haven't been able to
convince myself otherwise, nor have I been able to find a case in which
it does (I had testing code to detect this).


Thanks again,

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] fold fold_truth_andor field merging into ifcombine

Reply via email to