On Tue, 11 Jul 2023, Jan Hubicka wrote: > > > > What I saw most wrecking the profile is when passes turn > > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup > > which then simply deletes one of the outgoing edges without doing > > anything to the (guessed) profile. > > Yep, I agree that this is disturbing. At the cfg cleanup time one can > hardly do anything useful, since the knowledge of transform that caused > profile inconsistency is forgotten. however I think it is not a complete > disaster. > > With profile feedback the most common case of this happening is a > situation where we duplicated code (by inlining, unrolling etc.) into a > context where it behaves differently then the typical behaviour > represented by the profile. > > So if one ends up zapping edge with large probability, one also knows > that the code being optimized does not exhibit typical behaviour from > the train run and thus is not very hot. So profile inconsistency should > not affect performance that much. > > So doing nothing is IMO may end up being safter than trying to get the > in/out counts right without really known what is going on. > > This is mostly about the scenario "constant propagated this conditional > and profile disagrees with me". There are other cases where update is > IMO important. i.e. vectorizer forgetting to cap #of iterations of > epilogue may cause issue since the epilogue loop looks more frequent > than the main vectorized loop and it may cause IRA to insert spilling > into it or so. > > When we duplicate we have chance to figure out profile updates. > Also we may try to get as much as possible done early. > I think we should again do loop header copying that does not expand code > at early opts again. I have some more plans on cleaning up loop-ch and > then we can give it a try. > > With guessed profile we always have option to re-do the propagation. > There is TODO_rebuild_frequencies for that which we do after inlining. > This is mostly to handle possible overflows on large loops nests > constructed by inliner. > > We can re-propagate once again after late cleanup passes. Looking at the > queue, we have: > > NEXT_PASS (pass_remove_cgraph_callee_edges); > /* Initial scalar cleanups before alias computation. > They ensure memory accesses are not indirect wherever possible. */ > NEXT_PASS (pass_strip_predict_hints, false /* early_p */); > NEXT_PASS (pass_ccp, true /* nonzero_p */); > /* After CCP we rewrite no longer addressed locals into SSA > form if possible. */ > NEXT_PASS (pass_object_sizes); > NEXT_PASS (pass_post_ipa_warn); > /* Must run before loop unrolling. */ > NEXT_PASS (pass_warn_access, /*early=*/true); > NEXT_PASS (pass_complete_unrolli); > ^^^^ here we care about profile > NEXT_PASS (pass_backprop); > NEXT_PASS (pass_phiprop); > NEXT_PASS (pass_forwprop); > /* pass_build_alias is a dummy pass that ensures that we > execute TODO_rebuild_alias at this point. */ > NEXT_PASS (pass_build_alias); > NEXT_PASS (pass_return_slot); > NEXT_PASS (pass_fre, true /* may_iterate */); > NEXT_PASS (pass_merge_phi); > NEXT_PASS (pass_thread_jumps_full, /*first=*/true); > ^^^^ here > > By now we did CCP and FRE so we likely optimized out most of constant > conditionals exposed by inline.
So maybe we should simply delay re-propagation of the profile? I think cunrolli doesn't so much care about the profile - cunrolli is (was) about abstraction removal. Jump threading should be the first pass to care. Richard.