On Mon, Oct 20, 2014 at 12:02 AM, Xinliang David Li <davi...@google.com> wrote: > On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li <davi...@google.com> wrote: >> On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>> The difference in instrumentation runtime is huge -- as topn profiler >>>> is pretty expensive to run. >>>> >>>> With FDO, it is probably better to make early inlining more aggressive >>>> in order to get more context sensitive profiling. >>> >>> I agree with that, I just would like to understand where increasing the >>> iterations >>> helps and if we can handle it without iterating (because Richi originally >>> requested to >>> drop the iteration for correcness issues)
Well, I requested to do any iteration with an IPA view in mind. That is, iterate for cgraph cycles for example where currently we face the situation that at least one function is inlined unoptimized. For this we'd like to first optimize without inlining (well, maybe inlining doesn't hurt) and then inline (and re-optimize if we inlined). Indirect edges are more interesting, but basically you'd want to re-inline once you discover new direct calls during early opts (but then make sure to do that only after the direct callee was early-optimized first). Thus it would be nice if somebody could improve on the currently very simple function ordering we apply early opts, integrating "iteration" in a better way (not iterating over all functions but only where it might make a difference, focused on inlining). >>> Do you have some examples? >> >> We can do FDO experiment by shutting down einline. (Note that >> increasing iteration to 2 did not actually improve performance with >> our benchmarks). > > Early inlining itself has large performance impact for FDO (the > runtime of the profile-use build). With it disabled, the FDO > performance drops by >2% on average. The degradation is seen across > all benchmarks except for one. Only 2%? You are lucky ;) For tramp3d introducing early inlining made a difference of 100000% ;) (yes, statistically for tramp3d we have for each assembler instruction generated 100 calls in the initial code ... wheee C++ template metaprogramming!) So indeed early inlining was absoultely required to make FDO usable at all. Richard. > David > > >> >> David >> >>> Honza >>>> >>>> David >>>> >>>> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>> >> Increasing the number of early inliner iterations from 1 to 2 enables >>>> >> more >>>> >> indirect calls to be promoted/inlined before instrumentation. This in >>>> >> turn >>>> >> reduces the instrumentation overhead, particularly for more expensive >>>> >> indirect >>>> >> call topn profiling. >>>> > >>>> > How much difference you get here? One posibility would be also to run >>>> > specialized >>>> > ipa-cp before profile instrumentation. >>>> > >>>> > Honza >>>> >> >>>> >> Passes internal testing and regression tests. Ok for google/4_9? >>>> >> >>>> >> 2014-10-18 Teresa Johnson <tejohn...@google.com> >>>> >> >>>> >> Google ref b/17934523 >>>> >> * opts.c (finish_options): Increase >>>> >> max-early-inliner-iterations to 2 >>>> >> for profile-gen and profile-use builds. >>>> >> >>>> >> Index: opts.c >>>> >> =================================================================== >>>> >> --- opts.c (revision 216286) >>>> >> +++ opts.c (working copy) >>>> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g >>>> >> opts->x_param_values, opts_set->x_param_values); >>>> >> } >>>> >> >>>> >> + if (opts->x_profile_arc_flag >>>> >> + || opts->x_flag_branch_probabilities) >>>> >> + { >>>> >> + maybe_set_param_value >>>> >> + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2, >>>> >> + opts->x_param_values, opts_set->x_param_values); >>>> >> + } >>>> >> + >>>> >> if (!(opts->x_flag_auto_profile >>>> >> || (opts->x_profile_arc_flag || >>>> >> opts->x_flag_branch_probabilities))) >>>> >> { >>>> >> >>>> >> >>>> >> -- >>>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413