On Mon, Oct 21, 2024 at 4:30 AM Alexandre Oliva <ol...@adacore.com> wrote:
>
> On Oct 10, 2024, Richard Biener <richard.guent...@gmail.com> wrote:
>
> > Thanks for working on this.  There's #if 0 portions in the patch - did you
> > send the correct version?
>
> 'fraid so.  Sorry, I'd forgotten about that bit.  Long story (not so)
> short, this patch is a bit of a frankenstein monster: there's a large
> portion from the original fold_truth_andor patch, that deals with the
> bit field merging; there's a largish portion from a couple of years ago,
> in which I set out to reconstruct the mergeable expressions in gimple
> for ifcombine, but failed because vuses would unavoidably be present,
> and the latest recent try in which I went ahead with the bits that
> dropped vuses, and managed to introduce in ifcombine what I considered
> the most valuable contribution in this patch, namely, the combination of
> non-neighbor conditions.  The logic from each phase was not really
> touched in subsequent phases, and years had elapsed, so the recollection
> of these cpp conditionals had slipped out of my mind.
>
> My swapped-back-in realization is that I took a copy of
> ssa_is_replaceable_p and made it ssa_is_substitutable_p, with some
> undesired conditions removed by #if 0 along wiht declarations only used
> in those bits.  The idea back then was to highlight the differences, so
> that, when revisiting it, I coulud perhaps refactor and combine the
> functions if that made sense.  I'm not sure it does.
>
> > I wonder if we can check the locations as to whether the spelling location
> > is different and suppress diagnostics when macro expansions were involved?
>
> Presumably.  I had tracked down the ppc condition back in the
> fold_truth_andoor version of the patch, but the x86 hits are newer, and
> the line numbers didn't help me find the errors, so I don't even know
> whether it's macros that we're talking about.  These new warnings were
> encouraging, for they showed that the ifcombine incarnation of the patch
> covered additional combinations that the fold-based one couldn't.  But
> they only came up in the latest version, so I know the mutually
> exclusive conditions are not neighbors.
>
> But I digress.  The relevant information here is that the line numbers
> are off, so I don't even know whether macros are involved.
>
> > Adding -Wno-error should be the last resort and I suspect such cases
> > happen in user code as well?
>
> I'm pretty sure this will come up in user code, yeah, but isn't that
> something we'd *want* to flag, rather than suppress?  I mean, mutually
> exclusive conditions are probably something worth flagging and looking
> into even (especially?) if they come from macros.  The fact that we
> missed them before doesn't strike me as enough of a reason to silence
> them.
>
> >> +/* Return TRUE if expression STMT is suitable for replacement.  ???
> >> +   Same as ssa_is_replaceable_p, except that we don't insist it has a
> >> +   single use.  */
> >>
> >> +static bool
> >> +ssa_is_substitutable_p (gimple *stmt)
>
> > I don't yet see how many times you use this - there's other passes
> > which need to query a subset of this like DCE or DSE for whether
> > a stmt can be elided.
>
> Only twice, in the bits from phase2 that recognize and reconstruct from
> gimple condition operand trees that could be merged.  I realize now that
> perhaps gimple matching could do this job, so perhaps I can get rid of
> this.
>
> > This is mostly gimple_has_side_effects () but there's also
> > !cfun->can_delete_dead_exceptions && stmt_could_throw_p (cfun, stmt)
>
> Erhm, I don't think the paragraph above applies to
> ssa_is_substitutable_p.
>
> > I think the name is probably bad and suggests it's generally applicable,
> > so maybe rename it to something more specific for your use?
>
> ACK, will do if I don't get rid of it.
>
> TBH, I don't think I've given this modified version of
> tree-outof-ssa.cc's ssa_is_replaceable_p() much thought, I just knew I
> didn't mind combining/merging trees even if they had additional uses.
>
> Should I apply your suggestions for it to the original function?
>
> >> +static tree
> >> +is_cast_p (tree *name)
>
> > very short name, add a prefix?
>
> Will do if it survives.
>
> >> +{
> >> +  tree type = 0;
> >> +
> >> +  while (TREE_CODE (*name) == SSA_NAME
> >> +        && !SSA_NAME_IS_DEFAULT_DEF (*name))
> >> +    {
> >> +      gimple *def = SSA_NAME_DEF_STMT (*name);
> >> +
> >> +      if (!ssa_is_substitutable_p (def))
>
> > Overly restrictive/expensive?  Just check if it's a GIMPLE assign?
>
> This is from phase2, so the details escape me already, but I don't think
> that would be enough.  We need to know that the definition is movable in
> principle, presumably to the point of use.  If we combine conditions,
> we'll end up evaluating it at that point.
>
> >> +      if (gimple_assign_single_p (def))
> >> +       {
> >> +         if (gimple_assign_load_p (def))
> >> +           break;
> >> +         *name = gimple_assign_rhs1 (def);
>
> > that could be a constant, or a REALPART_EXPR.  Maybe move after ...
>
> >> +         continue;
> >> +       }
> >> +
> >> +      if (!gimple_assign_cast_p (def))
> >> +       break;
>
> > ... this?
>
> I don't get what you're getting at here.
>
> IIRC this function recognizes gimple that, in the fold logic, used to
> come up in (bit)field extraction and manipulation for testing,
> particularly type widening and narrowing.  It is supposed to preserve
> the preexisting logic.

The GIMPLE variant of that you came up with looks overly broad to
the extent of being wrong (in the wrong-code sense), but of course
it's always difficult to tell when looking at this piece-wise.

> >> +
> >> +      if (!type)
> >> +       type = TREE_TYPE (*name);
> >> +      *name = gimple_assign_rhs1 (def);
> >> +    }
> >> +
> >> +  return type;
>
> > Overall this is a very odd function as it strips even (widen)(truncate)x?
>
> It's not really stripping anything, just identifying the initial type of
> the field that underwent type casts.
>
> >> +  /* We are interested in the bare arrangement of bits, so strip 
> >> everything
> >> +     that doesn't affect the machine mode.  However, record the type of 
> >> the
> >> +     outermost expression if it may matter below.  */
> >> +  outer_type = is_cast_p (&exp);
>
> > I think is_cast_p strips more than casts not affecting the machine mode.
> > It looks like you want a STRIP_NOPS that works on GIMPLE?
>
> I don't think so.  IIRC this is modeled after preexisting code that did
> more than STRIP_NOPs, it followed CONVERT_EXPR_P as well.
>
> > I'm not sure if it would scale, but it looks like there's a certain
> > set of patterns
> > this tries to match - did you think of using a (match ...) pattern in 
> > match.pd
> > instead?
>
> I didn't.  I recall looking into such pattern matching at some point,
> before getting started on ifcombine, but I didn't revisit it.  The
> patterns I needed to recognize to preserve the fold functionality were
> so varied depending on details that I wasn't sure that would be useful.
> But I guess now I should give it a try.
>
> >> +static tree
> >> +unextend (tree c, int p, int unsignedp, tree mask)
>
> > I'd like all of the above to be done via wide_int, that has utilities to 
> > build
> > (shifted) masks and sign- or zero-extend from specific bits.
>
> Ugh :-) I was hoping I wouldn't have to revisit the bit-manipulation
> logic taken from the earliest version of the patch, let alone rewrite it
> :-)
>
> >> +  tree ref = make_bit_field_ref (loc, unshare_expr (inner),
> >> +                                unshare_expr (orig_inner),
> >> +                                type, bitsize, bitpos,
>
> > I suppose 'inner' is what the earlier get_inner_reference returned - I don't
> > think this built load properly preserves TBAA properties, at least
> > a MEM[(char *)&a] would have an inner of 'a'?
>
> These bitfield references combine access to multiple fields of a.  It's
> not very much unlike accessing the entirety of 'a'.  What else
> could/should it be?

It should be a semantically valid transform ;)  It probably falls apart
only for "interesting" cases like in-place construction of a different
typed entity using the storage of 'a' but nevertheless - given 'inner'
isn't necessarily a declaration but could be a pointer dereference
(of aggregate type) this concern is valid.

> >> +build_split_load (tree /* out */ ln_arg[2],
>
> >> +      tree type = lang_hooks.types.type_for_size (bitsiz[i], 1);
>
> > Use type_for_mode?
>
> Hmm, I'm pretty sure I tried something like that but had to resort to
> this back in the first version.
>
> >> +static int
> >> +mergeable_loads_p (gimple *lstmt, gimple *rstmt)
>
> >> +  if (stmt_dominates_stmt_p (lstmt, rstmt))
>
> > Can you avoid these please?
>
> I'm not sure.  IIRC we need to know which stmt dominates the other.  But
> that's only to know where to insert prep stmts; maybe we can return them
> and leave it for the caller to gimplify as needed.
>
> >   if (gimple_vuse (stmts[0]) != gimple_vuse (stmts[1]))
> >     return 0;
>
> *doh*, yeah, right, thanks!
>
> > The function is large - maybe some factoring is possible?  I also notice
> > you return a GENERIC expression here where having a GIMPLE value
> > and an alternate gimple_seq output for defining stmts would be better?
>
> It would have to be two seqs, to handle split cases.
> I'll give that a try.
>
> >> +static bool
> >> +constant_condition_p (gcond *cond)
> >> +{
> >> +  if (!cond)
> >> +    return true;
>
> > It's a bit odd this returns true for no condition. Maybe
> > static_outgoing_edge_p ()?
>
> That name would work.  I'll think of other possibilities.  I didn't
> expect that covering the degenerate case would be a problem.
>
> > (what about a throwing stmt?)
>
> That would prevent ifcombine from considering the block.
>
> > IMO inlining the predicate to it's few(?) uses
> > avoids giving it a "proper" name ;)
>
> I'll consider that.  Are you suggesting inlining the second
> constant_condition_p (that takes a bb), the one quoted above, or both?

The one with the weird semantics (no condition) at least.  When reading
code I always prefer slightly longer but "standard API" code over lots
of little "pass specific" helpers that make the code smaller but makes you
need to lookup the exact semantics of said uncommon predicates.

> >> +/* Same as recognize_if_then_else, but check that the condition is not
> >> +   constant.  It is not useful to combine constant conditions.  */
>
> > I do wonder why we ever get to the situation with constant conditions,
> > thus why recognize_if_then_else should match them in the first place?
>
> Two situations:
>
> - when there's a straight-line block (that could be cleaned up)
>   separating blocks with combinable conditions
>
> - when we're crossing a block whose condition has already been combined,
>   looking for other conditions to combine in earlier blocks

Ah, ok.

> >> +static bool
> >> +recognize_if_succs (basic_block cond_bb,
> >> +                   basic_block succ1, basic_block succ2)
>
> > Can you use
>
> >   basic_block tem_then = NULL, tem_else = NULL;
> >   recognize_if_then_else (cond_bb, &tem_then, &tem_else);
>
> > or even
>
> >    (recognize_if_then_else (cond_bb, succ1, succ2)
> >     || recognize_if_then_else (cond_bb, succ2, succ1))
>
> > at callers instead of adding another function?
>
> I could, I recall that it worked that way before I optimized it to avoid
> various tests that we don't need but the other function performs, and
> even the indirection and optional filling-in that we don't need.
>
>
> >> @@ -129,7 +183,7 @@ bb_no_side_effects_p (basic_block bb)
> >> enum tree_code rhs_code;
> >> if (gimple_has_side_effects (stmt)
> >> || gimple_could_trap_p (stmt)
> >> -         || gimple_vuse (stmt)
> >> +         /* || gimple_vuse (stmt) */
>
> > I think you want to preserve || gimple_vdef (stmt) at least - this possibly
> > saves you from checking for intermediate stores.
>
> That's covered by gimple_has_side_effects, I hope.
> (checks)
> Hmm, apparently not.
> Comments in there, and even its name, are now very misleading.

gimple_has_side_effects says "effects not represented in the IL",
it doesn't mean side-effect in the sense of how it's used by language standards.

> Yeah, looks like we need to rule out gimple_vdef.
>
> > As you special-case hard register uses, do we want to ever make 
> > hard-register
> > uses or defs unconditional?  I'll note that on GIMPLE hard register 
> > uses/defs
> > are treated as memory, so they have virtual defs and uses.
>
> Erhm, do I?  I don't recall that, and I can't tell what you're getting
> at.

Somewhere you tested DECL_HARD_REGISTER?

> >> +/* Replace the conditions in INNER_COND and OUTER_COND with COND and 
> >> COND2.
> >> +   COND and COND2 are computed for insertion at INNER_COND, with 
> >> OUTER_COND
> >> +   replaced with a constant, but if there are intervening blocks, it's 
> >> best to
> >> +   adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND.  
> >> */
>
> > What kind of "intervening" blocks could there be?
>
> (a.f1 == b.f1 // bb A
>  && x != y // bb B
>  && a.f2 == b.f2 // bb C
> ... )
>
> We may be able to combine C into A, without penalty, as long as B has no
> side effects that prevent the combination.
>
> >> +static tree
> >> +ifcombine_replace_cond (gcond *inner_cond, bool inner_inv,
> >> +                       gcond *outer_cond, bool outer_inv,
> >> +                       tree cond, bool must_canon,
> >> +                       tree cond2)
> >> +{
> [...]
> >> +  if (tree tcanon = canonicalize_cond_expr_cond (cond))
> >> +    cond = tcanon;
> >> +  else if (must_canon)
> >> +    return NULL_TREE;
>
> > ?  I think you should use gimple_build or gimplify what was built to
> > a is_gimple_condexpr_for_cond like ifcombine does elsehwere.
>
> That bit is refactored from preexisting code.  Some paths required
> canonicalization to succeed.  I only preserved that.
>
> >> -  /* Path outer_cond_bb->(outer2) needs to be merged into path
> >> -     outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken)
> >> -     and probability of inner_not_taken updated.  */
> >> +       /* Mark SSA DEFs that are referenced by cond and may thus need to 
> >> be
> >> +          moved to outer.  */
> >> +       {
> >> +         ifcombine_mark_ssa_name_t data = { used, outer_bb };
> >> +         walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL);
> >> +       }
>
> > I think this moving code should be an enhancement ontop of the basic
> > functionality - that is, I'd insert at inner_cond.
>
> Combining A into C (from my example above), as you suggest here, could
> penalize a carefully written sequence of conditions, so I'd rather not
> do that.  It wouldn't even be an optimization.

Hmm, OK.  Still this makes the whole thing quite compliated - none of
if-combine currently tries to "re-associate" such cases.  I think this feature
addition should be split out at least.

> > The intervening blocks must
> > necessarily form straight-line code and thus we're doing "scheduling" here
> > which is premature on GIMPLE?
>
> Maybe it wasn't clear from the description that the intervening blocks
> could have multiple other unrelated conditions?

Yes, that wasn't clear.

> >> +      /* Leave CFG optimization to cfg_cleanup.  */
> >> +      gimple_cond_set_condition_from_tree (outer_cond,
> >> +                                          outer_inv
> >> +                                          ? boolean_false_node
> >> +                                          : boolean_true_node);
>
> > There is gimple_cond_make_{true,false}
>
> Yeah.  This code was factored from ifcombine_ifandif.
> I don't like making unrelated changes during refactoring.
>
> I understand I'm to separate the refactoring, so I might as well make
> this cleanup (?) another separate patch.
>
> >> +      if (!ifcombine_replace_cond (inner_cond, inner_inv,
> >> +                                  outer_cond, outer_inv,
> >> +                                  t, false, NULL_TREE))
>
> > Can you split out factoring this function?
>
> Heh.  Sure.  Funny.  I had it separate before I folded and consolidated
> the patch to submit it.
>
> >> -      if (!is_gimple_condexpr_for_cond (t))
> >> -       {
> >> -         gsi = gsi_for_stmt (inner_cond);
> >> -         t = force_gimple_operand_gsi_1 (&gsi, t, 
> >> is_gimple_condexpr_for_cond,
> >> -                                         NULL, true, GSI_SAME_STMT);
> >> -       }
> >> -      gimple_cond_set_condition_from_tree (inner_cond, t);
> >> -      update_stmt (inner_cond);
> >> -
> >> -      /* Leave CFG optimization to cfg_cleanup.  */
> >> -      gimple_cond_set_condition_from_tree (outer_cond,
> >> -       outer_inv ? boolean_false_node : boolean_true_node);
> >> -      update_stmt (outer_cond);
> >> -      update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb);
>
> > And moving this?
>
> *nod*, it's the same (re)factoring.
>
> >> This requires a single predecessor of the inner cond_bb.  */
>
> > I'm a bit lost as to what the following now matches - it's certainly
> > not only the
> > special-case in the above comment.  Having the loop also appears to make
> > it somewhat "unbound"?
>
> We're walking up the chain of single-preds looking for an outer block
> whose condition can be combined with the inner.
>
> The comment still holds, but I guess pointing out that intervening
> blocks may exist and be skipped.
>
> > Isn't this also an improvement that's independent on the bitfield compare
> > simplification?
>
> Yes.
>
> > If so can you split it out to make the overall patch smaller?
>
> Heh.  Sure.  Funny again ;-)
>
> The code for moving defs up and combining into the earlier condition
> would probably also go into this separate patch, because without
> intervening blocks, using the inner block is always ok, except when we
> get a split condition, i.e., when the second condition accesses a field
> that straddles over words, and the first condition accesses one of those
> words.
>
> Handling that correctly requires the code to move bits up, but it
> complicates the patch splitting.  Would you prefer a larger patch
> sequence, breaking up the pieces so that every patch generates ideal
> code for the features it supports, or would an intermediate patch in
> which we combine the split conditions at the inner block in a way that
> loses the short-circuiting from andif/orif, but that a later patch in
> the sequence restores, be ok?

I think intermediate non-optimal code is OK.  I think doing the general
if-combine enhancements (condition re-association) first would be good,
WRT the code motion I defer to you if it makes sense to disentangle that
from the re-association or not.

I'll note that you do the transforms "greedily" rather than re-constructing
a combined condition from the CFG and find the "global" best combinations.
On my overly long TODO (or rather "wishlist") is also the ability to
do condition re-association _without_ necessarily combining, for example
when profile feedback identifies an easy to predict currently inner condition
we want to test that first (turn it to the outer condition).  Iff the whole
condition would be combined then the scalar reassoc pass would also
re-associate conditions to do "related" checks together, like transform
(a == 1) || (c == 3) || (a == 7) to (a == 1) || (a == 7) || (c == 3).
Being able
to identify the CFG part that can be freely associated (no side-effects
intervening) this way is a first step thats somehow hidden in your patch.

>
> >> if (safe_is_a <gcond *> (*gsi_last_bb (bb)))
> >> -       if (tree_ssa_ifcombine_bb (bb))
> >> +       if (basic_block outer_bb = tree_ssa_ifcombine_bb (bb))
>
> > would it help if we'd perform this walk in a more defined manner?  In 
> > particular
> > after your more complex CFG matching is single-pred-before-succ order still
> > appropriate?
>
> Perhaps.  There's an apparent potential for quadratic behavior here, but
> I'm not sure it is actually there.  If it is, a different order might or
> might not make things better.
>
> I'm not entirely sure that the combination into outer requires
> reconsidering outer.  ISTM that, if we could have combined outer, we
> would have already done that, and then the combination of inner would
> involve the same third block.
>
> I am sure, however, that reconsidering inner is necessary for some
> optimizations, such as cases in which we split its condition, and then
> we could combine part of the condition into one block, and another part
> into another.
>
> >> +           /* Go back to outer_bb, in case it could be further optimized, 
> >> but
> >> +              only at -O2+, since it could get quadratic.  */
>
> > You're also walking back(?) in tree_ssa_ifcombine_bb now.  I also think
> > single-pred-before-succ order is supposed to catch all 2nd level 
> > opportunities?
>
> No, we need at the very least to retry the inner block after a
> successful combination, because of split conditions, and because in most
> cases we bring the outer condition into the inner one.
>
> The uncertainty to me is whether there's any case in which a combination
> into outer makes room for outer to be further combined as inner.  I
> suspect some split condition cases might, and I haven't been able to
> convince myself otherwise, nor have I been able to find a case in which
> it does (I had testing code to detect this).

I see.  Maybe we want to put some hard limit on the amount of work the
pass does trying to form combinations.

Thanks,
Richard.

>
> Thanks again,
>
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Reply via email to