On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > 
> > What I saw most wrecking the profile is when passes turn
> > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
> > which then simply deletes one of the outgoing edges without doing
> > anything to the (guessed) profile.
> 
> Yep, I agree that this is disturbing.  At the cfg cleanup time one can
> hardly do anything useful, since the knowledge of transform that caused
> profile inconsistency is forgotten.  however I think it is not a complete
> disaster.
> 
> With profile feedback the most common case of this happening is a
> situation where we duplicated code (by inlining, unrolling etc.) into a
> context where it behaves differently then the typical behaviour
> represented by the profile.
> 
> So if one ends up zapping edge with large probability, one also knows
> that the code being optimized does not exhibit typical behaviour from
> the train run and thus is not very hot.  So profile inconsistency should
> not affect performance that much.
> 
> So doing nothing is IMO may end up being safter than trying to get the
> in/out counts right without really known what is going on.
> 
> This is mostly about the scenario "constant propagated this conditional
> and profile disagrees with me".  There are other cases where update is
> IMO important.  i.e. vectorizer forgetting to cap #of iterations of
> epilogue may cause issue since the epilogue loop looks more frequent
> than the main vectorized loop and it may cause IRA to insert spilling
> into it or so.
> 
> When we duplicate we have chance to figure out profile updates.
> Also we may try to get as much as possible done early.
> I think we should again do loop header copying that does not expand code
> at early opts again.  I have some more plans on cleaning up loop-ch and
> then we can give it a try.
> 
> With guessed profile we always have option to re-do the propagation.
> There is TODO_rebuild_frequencies for that which we do after inlining.
> This is mostly to handle possible overflows on large loops nests
> constructed by inliner.  
> 
> We can re-propagate once again after late cleanup passes. Looking at the
> queue, we have:
> 
>       NEXT_PASS (pass_remove_cgraph_callee_edges);
>       /* Initial scalar cleanups before alias computation.
>          They ensure memory accesses are not indirect wherever possible.  */
>       NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
>       NEXT_PASS (pass_ccp, true /* nonzero_p */);
>       /* After CCP we rewrite no longer addressed locals into SSA
>          form if possible.  */
>       NEXT_PASS (pass_object_sizes);
>       NEXT_PASS (pass_post_ipa_warn);
>       /* Must run before loop unrolling.  */
>       NEXT_PASS (pass_warn_access, /*early=*/true);
>       NEXT_PASS (pass_complete_unrolli);
> ^^^^ here we care about profile
>       NEXT_PASS (pass_backprop);
>       NEXT_PASS (pass_phiprop);
>       NEXT_PASS (pass_forwprop);
>       /* pass_build_alias is a dummy pass that ensures that we
>          execute TODO_rebuild_alias at this point.  */
>       NEXT_PASS (pass_build_alias);
>       NEXT_PASS (pass_return_slot);
>       NEXT_PASS (pass_fre, true /* may_iterate */);
>       NEXT_PASS (pass_merge_phi);
>       NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
> ^^^^ here
> 
> By now we did CCP and FRE so we likely optimized out most of constant
> conditionals exposed by inline.

So maybe we should simply delay re-propagation of the profile?  I
think cunrolli doesn't so much care about the profile - cunrolli
is (was) about abstraction removal.  Jump threading should be
the first pass to care.

Richard.

Reply via email to