On Oct 10, 2024, Richard Biener <richard.guent...@gmail.com> wrote: > Thanks for working on this. There's #if 0 portions in the patch - did you > send the correct version?
'fraid so. Sorry, I'd forgotten about that bit. Long story (not so) short, this patch is a bit of a frankenstein monster: there's a large portion from the original fold_truth_andor patch, that deals with the bit field merging; there's a largish portion from a couple of years ago, in which I set out to reconstruct the mergeable expressions in gimple for ifcombine, but failed because vuses would unavoidably be present, and the latest recent try in which I went ahead with the bits that dropped vuses, and managed to introduce in ifcombine what I considered the most valuable contribution in this patch, namely, the combination of non-neighbor conditions. The logic from each phase was not really touched in subsequent phases, and years had elapsed, so the recollection of these cpp conditionals had slipped out of my mind. My swapped-back-in realization is that I took a copy of ssa_is_replaceable_p and made it ssa_is_substitutable_p, with some undesired conditions removed by #if 0 along wiht declarations only used in those bits. The idea back then was to highlight the differences, so that, when revisiting it, I coulud perhaps refactor and combine the functions if that made sense. I'm not sure it does. > I wonder if we can check the locations as to whether the spelling location > is different and suppress diagnostics when macro expansions were involved? Presumably. I had tracked down the ppc condition back in the fold_truth_andoor version of the patch, but the x86 hits are newer, and the line numbers didn't help me find the errors, so I don't even know whether it's macros that we're talking about. These new warnings were encouraging, for they showed that the ifcombine incarnation of the patch covered additional combinations that the fold-based one couldn't. But they only came up in the latest version, so I know the mutually exclusive conditions are not neighbors. But I digress. The relevant information here is that the line numbers are off, so I don't even know whether macros are involved. > Adding -Wno-error should be the last resort and I suspect such cases > happen in user code as well? I'm pretty sure this will come up in user code, yeah, but isn't that something we'd *want* to flag, rather than suppress? I mean, mutually exclusive conditions are probably something worth flagging and looking into even (especially?) if they come from macros. The fact that we missed them before doesn't strike me as enough of a reason to silence them. >> +/* Return TRUE if expression STMT is suitable for replacement. ??? >> + Same as ssa_is_replaceable_p, except that we don't insist it has a >> + single use. */ >> >> +static bool >> +ssa_is_substitutable_p (gimple *stmt) > I don't yet see how many times you use this - there's other passes > which need to query a subset of this like DCE or DSE for whether > a stmt can be elided. Only twice, in the bits from phase2 that recognize and reconstruct from gimple condition operand trees that could be merged. I realize now that perhaps gimple matching could do this job, so perhaps I can get rid of this. > This is mostly gimple_has_side_effects () but there's also > !cfun->can_delete_dead_exceptions && stmt_could_throw_p (cfun, stmt) Erhm, I don't think the paragraph above applies to ssa_is_substitutable_p. > I think the name is probably bad and suggests it's generally applicable, > so maybe rename it to something more specific for your use? ACK, will do if I don't get rid of it. TBH, I don't think I've given this modified version of tree-outof-ssa.cc's ssa_is_replaceable_p() much thought, I just knew I didn't mind combining/merging trees even if they had additional uses. Should I apply your suggestions for it to the original function? >> +static tree >> +is_cast_p (tree *name) > very short name, add a prefix? Will do if it survives. >> +{ >> + tree type = 0; >> + >> + while (TREE_CODE (*name) == SSA_NAME >> + && !SSA_NAME_IS_DEFAULT_DEF (*name)) >> + { >> + gimple *def = SSA_NAME_DEF_STMT (*name); >> + >> + if (!ssa_is_substitutable_p (def)) > Overly restrictive/expensive? Just check if it's a GIMPLE assign? This is from phase2, so the details escape me already, but I don't think that would be enough. We need to know that the definition is movable in principle, presumably to the point of use. If we combine conditions, we'll end up evaluating it at that point. >> + if (gimple_assign_single_p (def)) >> + { >> + if (gimple_assign_load_p (def)) >> + break; >> + *name = gimple_assign_rhs1 (def); > that could be a constant, or a REALPART_EXPR. Maybe move after ... >> + continue; >> + } >> + >> + if (!gimple_assign_cast_p (def)) >> + break; > ... this? I don't get what you're getting at here. IIRC this function recognizes gimple that, in the fold logic, used to come up in (bit)field extraction and manipulation for testing, particularly type widening and narrowing. It is supposed to preserve the preexisting logic. >> + >> + if (!type) >> + type = TREE_TYPE (*name); >> + *name = gimple_assign_rhs1 (def); >> + } >> + >> + return type; > Overall this is a very odd function as it strips even (widen)(truncate)x? It's not really stripping anything, just identifying the initial type of the field that underwent type casts. >> + /* We are interested in the bare arrangement of bits, so strip everything >> + that doesn't affect the machine mode. However, record the type of the >> + outermost expression if it may matter below. */ >> + outer_type = is_cast_p (&exp); > I think is_cast_p strips more than casts not affecting the machine mode. > It looks like you want a STRIP_NOPS that works on GIMPLE? I don't think so. IIRC this is modeled after preexisting code that did more than STRIP_NOPs, it followed CONVERT_EXPR_P as well. > I'm not sure if it would scale, but it looks like there's a certain > set of patterns > this tries to match - did you think of using a (match ...) pattern in match.pd > instead? I didn't. I recall looking into such pattern matching at some point, before getting started on ifcombine, but I didn't revisit it. The patterns I needed to recognize to preserve the fold functionality were so varied depending on details that I wasn't sure that would be useful. But I guess now I should give it a try. >> +static tree >> +unextend (tree c, int p, int unsignedp, tree mask) > I'd like all of the above to be done via wide_int, that has utilities to build > (shifted) masks and sign- or zero-extend from specific bits. Ugh :-) I was hoping I wouldn't have to revisit the bit-manipulation logic taken from the earliest version of the patch, let alone rewrite it :-) >> + tree ref = make_bit_field_ref (loc, unshare_expr (inner), >> + unshare_expr (orig_inner), >> + type, bitsize, bitpos, > I suppose 'inner' is what the earlier get_inner_reference returned - I don't > think this built load properly preserves TBAA properties, at least > a MEM[(char *)&a] would have an inner of 'a'? These bitfield references combine access to multiple fields of a. It's not very much unlike accessing the entirety of 'a'. What else could/should it be? >> +build_split_load (tree /* out */ ln_arg[2], >> + tree type = lang_hooks.types.type_for_size (bitsiz[i], 1); > Use type_for_mode? Hmm, I'm pretty sure I tried something like that but had to resort to this back in the first version. >> +static int >> +mergeable_loads_p (gimple *lstmt, gimple *rstmt) >> + if (stmt_dominates_stmt_p (lstmt, rstmt)) > Can you avoid these please? I'm not sure. IIRC we need to know which stmt dominates the other. But that's only to know where to insert prep stmts; maybe we can return them and leave it for the caller to gimplify as needed. > if (gimple_vuse (stmts[0]) != gimple_vuse (stmts[1])) > return 0; *doh*, yeah, right, thanks! > The function is large - maybe some factoring is possible? I also notice > you return a GENERIC expression here where having a GIMPLE value > and an alternate gimple_seq output for defining stmts would be better? It would have to be two seqs, to handle split cases. I'll give that a try. >> +static bool >> +constant_condition_p (gcond *cond) >> +{ >> + if (!cond) >> + return true; > It's a bit odd this returns true for no condition. Maybe > static_outgoing_edge_p ()? That name would work. I'll think of other possibilities. I didn't expect that covering the degenerate case would be a problem. > (what about a throwing stmt?) That would prevent ifcombine from considering the block. > IMO inlining the predicate to it's few(?) uses > avoids giving it a "proper" name ;) I'll consider that. Are you suggesting inlining the second constant_condition_p (that takes a bb), the one quoted above, or both? >> +/* Same as recognize_if_then_else, but check that the condition is not >> + constant. It is not useful to combine constant conditions. */ > I do wonder why we ever get to the situation with constant conditions, > thus why recognize_if_then_else should match them in the first place? Two situations: - when there's a straight-line block (that could be cleaned up) separating blocks with combinable conditions - when we're crossing a block whose condition has already been combined, looking for other conditions to combine in earlier blocks >> +static bool >> +recognize_if_succs (basic_block cond_bb, >> + basic_block succ1, basic_block succ2) > Can you use > basic_block tem_then = NULL, tem_else = NULL; > recognize_if_then_else (cond_bb, &tem_then, &tem_else); > or even > (recognize_if_then_else (cond_bb, succ1, succ2) > || recognize_if_then_else (cond_bb, succ2, succ1)) > at callers instead of adding another function? I could, I recall that it worked that way before I optimized it to avoid various tests that we don't need but the other function performs, and even the indirection and optional filling-in that we don't need. >> @@ -129,7 +183,7 @@ bb_no_side_effects_p (basic_block bb) >> enum tree_code rhs_code; >> if (gimple_has_side_effects (stmt) >> || gimple_could_trap_p (stmt) >> - || gimple_vuse (stmt) >> + /* || gimple_vuse (stmt) */ > I think you want to preserve || gimple_vdef (stmt) at least - this possibly > saves you from checking for intermediate stores. That's covered by gimple_has_side_effects, I hope. (checks) Hmm, apparently not. Comments in there, and even its name, are now very misleading. Yeah, looks like we need to rule out gimple_vdef. > As you special-case hard register uses, do we want to ever make hard-register > uses or defs unconditional? I'll note that on GIMPLE hard register uses/defs > are treated as memory, so they have virtual defs and uses. Erhm, do I? I don't recall that, and I can't tell what you're getting at. >> +/* Replace the conditions in INNER_COND and OUTER_COND with COND and COND2. >> + COND and COND2 are computed for insertion at INNER_COND, with OUTER_COND >> + replaced with a constant, but if there are intervening blocks, it's best >> to >> + adjust COND for insertion at OUTER_COND, placing COND2 at INNER_COND. */ > What kind of "intervening" blocks could there be? (a.f1 == b.f1 // bb A && x != y // bb B && a.f2 == b.f2 // bb C ... ) We may be able to combine C into A, without penalty, as long as B has no side effects that prevent the combination. >> +static tree >> +ifcombine_replace_cond (gcond *inner_cond, bool inner_inv, >> + gcond *outer_cond, bool outer_inv, >> + tree cond, bool must_canon, >> + tree cond2) >> +{ [...] >> + if (tree tcanon = canonicalize_cond_expr_cond (cond)) >> + cond = tcanon; >> + else if (must_canon) >> + return NULL_TREE; > ? I think you should use gimple_build or gimplify what was built to > a is_gimple_condexpr_for_cond like ifcombine does elsehwere. That bit is refactored from preexisting code. Some paths required canonicalization to succeed. I only preserved that. >> - /* Path outer_cond_bb->(outer2) needs to be merged into path >> - outer_cond_bb->(outer_to_inner)->inner_cond_bb->(inner_taken) >> - and probability of inner_not_taken updated. */ >> + /* Mark SSA DEFs that are referenced by cond and may thus need to be >> + moved to outer. */ >> + { >> + ifcombine_mark_ssa_name_t data = { used, outer_bb }; >> + walk_tree (&cond, ifcombine_mark_ssa_name_walk, &data, NULL); >> + } > I think this moving code should be an enhancement ontop of the basic > functionality - that is, I'd insert at inner_cond. Combining A into C (from my example above), as you suggest here, could penalize a carefully written sequence of conditions, so I'd rather not do that. It wouldn't even be an optimization. > The intervening blocks must > necessarily form straight-line code and thus we're doing "scheduling" here > which is premature on GIMPLE? Maybe it wasn't clear from the description that the intervening blocks could have multiple other unrelated conditions? >> + /* Leave CFG optimization to cfg_cleanup. */ >> + gimple_cond_set_condition_from_tree (outer_cond, >> + outer_inv >> + ? boolean_false_node >> + : boolean_true_node); > There is gimple_cond_make_{true,false} Yeah. This code was factored from ifcombine_ifandif. I don't like making unrelated changes during refactoring. I understand I'm to separate the refactoring, so I might as well make this cleanup (?) another separate patch. >> + if (!ifcombine_replace_cond (inner_cond, inner_inv, >> + outer_cond, outer_inv, >> + t, false, NULL_TREE)) > Can you split out factoring this function? Heh. Sure. Funny. I had it separate before I folded and consolidated the patch to submit it. >> - if (!is_gimple_condexpr_for_cond (t)) >> - { >> - gsi = gsi_for_stmt (inner_cond); >> - t = force_gimple_operand_gsi_1 (&gsi, t, >> is_gimple_condexpr_for_cond, >> - NULL, true, GSI_SAME_STMT); >> - } >> - gimple_cond_set_condition_from_tree (inner_cond, t); >> - update_stmt (inner_cond); >> - >> - /* Leave CFG optimization to cfg_cleanup. */ >> - gimple_cond_set_condition_from_tree (outer_cond, >> - outer_inv ? boolean_false_node : boolean_true_node); >> - update_stmt (outer_cond); >> - update_profile_after_ifcombine (inner_cond_bb, outer_cond_bb); > And moving this? *nod*, it's the same (re)factoring. >> This requires a single predecessor of the inner cond_bb. */ > I'm a bit lost as to what the following now matches - it's certainly > not only the > special-case in the above comment. Having the loop also appears to make > it somewhat "unbound"? We're walking up the chain of single-preds looking for an outer block whose condition can be combined with the inner. The comment still holds, but I guess pointing out that intervening blocks may exist and be skipped. > Isn't this also an improvement that's independent on the bitfield compare > simplification? Yes. > If so can you split it out to make the overall patch smaller? Heh. Sure. Funny again ;-) The code for moving defs up and combining into the earlier condition would probably also go into this separate patch, because without intervening blocks, using the inner block is always ok, except when we get a split condition, i.e., when the second condition accesses a field that straddles over words, and the first condition accesses one of those words. Handling that correctly requires the code to move bits up, but it complicates the patch splitting. Would you prefer a larger patch sequence, breaking up the pieces so that every patch generates ideal code for the features it supports, or would an intermediate patch in which we combine the split conditions at the inner block in a way that loses the short-circuiting from andif/orif, but that a later patch in the sequence restores, be ok? >> if (safe_is_a <gcond *> (*gsi_last_bb (bb))) >> - if (tree_ssa_ifcombine_bb (bb)) >> + if (basic_block outer_bb = tree_ssa_ifcombine_bb (bb)) > would it help if we'd perform this walk in a more defined manner? In > particular > after your more complex CFG matching is single-pred-before-succ order still > appropriate? Perhaps. There's an apparent potential for quadratic behavior here, but I'm not sure it is actually there. If it is, a different order might or might not make things better. I'm not entirely sure that the combination into outer requires reconsidering outer. ISTM that, if we could have combined outer, we would have already done that, and then the combination of inner would involve the same third block. I am sure, however, that reconsidering inner is necessary for some optimizations, such as cases in which we split its condition, and then we could combine part of the condition into one block, and another part into another. >> + /* Go back to outer_bb, in case it could be further optimized, >> but >> + only at -O2+, since it could get quadratic. */ > You're also walking back(?) in tree_ssa_ifcombine_bb now. I also think > single-pred-before-succ order is supposed to catch all 2nd level > opportunities? No, we need at the very least to retry the inner block after a successful combination, because of split conditions, and because in most cases we bring the outer condition into the inner one. The uncertainty to me is whether there's any case in which a combination into outer makes room for outer to be further combined as inner. I suspect some split condition cases might, and I haven't been able to convince myself otherwise, nor have I been able to find a case in which it does (I had testing code to detect this). Thanks again, -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive