> > > > > > > Am 16.11.2024 um 14:08 schrieb Jan Hubicka <hubi...@ucw.cz>: > > > > > > Ignore conditions guarding __builtin_unreachable in inliner metrics > > > > > > This patch extends my last year attempt to make inliner metric ignore > > > conditionals guarding __builtin_unreachable. Compared to previous patch, > > > this > > > one implements a "mini-dce" in ipa-fnsummary to avoid accounting all > > > statements > > > that are only used to determine conditionals guarding > > > __builtin_unnecesary. > > > These will be removed later once value ranges are determined. > > > > > > While working on this, I noticed that we do have a lot of dead code while > > > computing fnsummary for early inline. Those are only used to apply > > > large-function growth, but it seems there is enough dead code to make this > > > valud kind of irrelevant. Also there seems to be quite a lot of > > > const/pure > > > calls that can be cheaply removed before we inline them. So I wonder if > > > we > > > want to run one DCE before early inlining. > > > > I would not have expected a ‚lot‘ of dead const function calls. By same > > argument we should rather run CCP before inlining as that tends to prune > > most dead code early? Just to quantify a lot, on tramp3d there are 2263 calls declared dead and 34206 declared live (by my mini-dce) in fnsummary1. So 6%. There are overall 3797 unnecesary stmts and 97263 necessary ones. So most of dead stuff are actually calls.
In IPA fnsummary there are 4 dead calls, all of them .part clones. This looks like missed optimization pre ipa-fnsplit. Those are most frequent ones: 12 skipping unnecesary stmt Evaluator<RemoteMultiPatchEvaluatorTag>::Evaluator (&evaluator); 12 skipping unnecesary stmt Pooma::DummyMutex::lock (_1); 16 skipping unnecesary stmt ForEach<Scalar<double>, DomainFunctorTag, DomainFunctorTag>::apply (_2, f_7(D), c_8(D)); 23 skipping unnecesary stmt operator delete (_5, _2); 24 skipping unnecesary stmt Evaluator<RemoteMultiPatchEvaluatorTag>::~Evaluator (&evaluator); 28 skipping unnecesary stmt ForEach<Scalar<double>, DomainFunctorTag, DomainFunctorTag>::apply (_1, f_6(D), c_7(D)); 37 skipping unnecesary stmt _37 = Field<UniformRectilinearMesh<MeshTraits<3, double, UniformRectilinearTag, CartesianTag, 3> >, double, BrickView>::engine (_9); 37 skipping unnecesary stmt _39 = engineFunctor<Engine<3, double, BrickView>, DataObjectRequest<BlockAffinity> > (_10, &getAffinity); 43 skipping unnecesary stmt DataObjectRequest<BlockAffinity>::DataObjectRequest (&getAffinity); 43 skipping unnecesary stmt MultiArgEvaluator<SinglePatchEvaluatorTag>::MultiArgEvaluator (&speval); 43 skipping unnecesary stmt Smarts::Iterate<Smarts::Stub>::hintAffinity (_8, _11); 86 skipping unnecesary stmt MultiArgEvaluator<SinglePatchEvaluatorTag>::~MultiArgEvaluator (&speval); 227 skipping unnecesary stmt PoomaCTAssert<true>::test (); Mostly those are functions which are empty at release_ssa time. Perhaps we could special case pures/const calls with no LHS and get rid of them during early inline. This is cheaper then doing actual inline which triggers a lot of logic in tree-inline... Honza