> >> +/* Member functions for string_table. */ > >> + > >> +string_table * > >> +string_table::create () > > > > Why this is not a constructor? > > We use static initializer because it's not suggested to put too much > logic in constructor.
Why not? :) > >> + } > > > > The two hunks probably can be unified. > > Why get_index_by_decl does not already use dwarf name and why do you need > > to consider both? > > get_index_by_decl is actually a wrapper of get_index. It tolerates > error when assembler name is not emitted in the debug binary (mostly > for functions that are fully inlined). In this case, we will first try > to find the index of the assembler name, if not found, then bfd name. > If still not found, then we try to find name from its abstract > location. Yep, I am just somewhat concerned that you need to try different variations of the name. I would expect the assembler name (minus the random seed mangling) to be sufficient. > > > > >> + if (DECL_ABSTRACT_ORIGIN (decl)) > >> + return get_function_instance_by_decl (lineno, DECL_ABSTRACT_ORIGIN > >> (decl)); > > > > What really happens for ipa-cp function clones and for split functions? > > Their name suffix will be stripped before matching names. I see and profile merged, that seems resonable. > > I am not sure it is a win to have the things represented as VPT histograms > > when you shoot > > for quite special purpose problem. But lets deal with these incrementally, > > too. > > (in general I do not see why you try to share so much with the gcov based > > VTP - it seems > > easier to do the speculation by hand) > > Basically there are 2 problem: > > * before annotation. I agree this is a special purpose problem > * after annotation. This should be the same problem as VPT. Yep, I think adding VPT histograms to get devirtualization happen inter-procedurally with LTO is a good idea. You will need (incrementally I guess) to translate your indexes into the profile-id used or you will need to initialize profile-ids accordingly, so the IPA pass is able to identify the target cross-module. > >> +{ > >> + if (gcov_open (auto_profile_file, 1) == 0) > >> + error ("Cannot open profile file %s.", auto_profile_file); > >> + > >> + if (gcov_read_unsigned () != GCOV_DATA_MAGIC) > >> + error ("AutoFDO profile magic number does not mathch."); > >> + > >> + /* Skip the version number. */ > >> + gcov_read_unsigned (); > >> + > >> + /* Skip the empty integer. */ > >> + gcov_read_unsigned (); > > > > Perhaps we can just check the values? > > What do you mean? Check the value against 0? Yes, if you have version in there it seems to make sense to check it ;) I guess it is there so you can add extra stuff into the file format that will make it incompatible with current one, so probably we should reject all versions greater than 0. (especially because the tool is off-tree) > > > > Does this have chance to work inter-module? > > Yes, it works fine with LIPO. Important (for me) is to make it work with LTO, because LIPO is apparently not landing to 5.0. > > Yes, this could be done. > > After a second thought, I think we actually need to expose einliner > interface to autofdo (instead of doing vpt in early inliner). This is > because we need to mark icall as "promoted", so that later vpt passes > will not try to promote it again. Currently in AutoFDO, we use a > self-contained way (use a "set" to mark all promoted stmts). If we do > vpt in einline, then we will need some flag attached to gimple to > indicate whether it's already promoted. If we manage to remove the iteration loop that repeativly applies the inline plan, you only need way to mark edge as promoted. I guess you can just test the speculative flag on the edge to skip ones you dealt with earlier. Lets do that incrementally though. > + > +@item -fauto-profile > +@itemx -fauto-profile=@var{path} > +@opindex fauto-profile > +Enable sampling based feedback directed optimizations, and optimizations > +generally profitable only with profile feedback available. > + > +The following options are enabled: @code{-fbranch-probabilities}, > @code{-fvpt}, > +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, > @code{-ftree-vectorize}, > +@code{ftree-loop-distribute-patterns} I wonder about loop peeling - we do not enale it by default because we do not believe our static branch predictor can well identify loops that have low expected trip count (except ones that can be fully inlined). I am not sure auto-FDO is much better here - how reliable the info about number of iteratoins is? But this is again something that can be handled separately. Documentation seems out of date though - it seems to enable also -finline, -fipa-cp and cloning, predictive commoning, unswitching, gcse-after-reload, You do not want to enable reorder-functions because autoFDO does not include the time profiler. Also I think speculative devirtualization should not be disabled. > + > +If @var{path} is specified, GCC looks at the @var{path} to find > +the profile feedback data files. > + > +In order to collect AutoFDO profile, you need to have: > + > +1. A linux system with linux perf support > +2. An Intel processor with last branch record (LBR) support Is it really a requirement or will it only make profiles more accurate? > + > +To collect the profile, first use linux perf to collect raw profile > +(see @uref{https://perf.wiki.kernel.org/}). > + > +E.g. > +perf record -e br_inst_retired:near_taken -b -o perf.data -- your_program You need some @ stuff to mark the example. > + > +Then use create_gcov tool, which takes raw profile and unstripped binary to > +generate AutoFDO profile that can be used by GCC. > +(see @uref{https://github.com/google/autofdo}). > + > +E.g. > +create_gcov --binary=your_program.unstripped --profile=perf.data > --gcov=profile.afdo Same here. I see the AutoFDO tool is under apache licence. I wonder if we can not share it in tree like we do for some other dependencies (asan) for easier use. > Index: gcc/opts.c > =================================================================== > --- gcc/opts.c (revision 215826) > +++ gcc/opts.c (working copy) > @@ -1279,6 +1279,57 @@ print_specific_help (unsigned int include_flags, > opts->x_help_columns, opts, lang_mask); > } > > +/* Enable FDO-related flags. */ > + > +static void > +enable_fdo_optimizations (struct gcc_options *opts, > + struct gcc_options *opts_set, > + int value) > +{ > + if (!opts_set->x_flag_branch_probabilities) > + opts->x_flag_branch_probabilities = value; > + if (!opts_set->x_flag_profile_values) > + opts->x_flag_profile_values = value; > + if (!opts_set->x_flag_unroll_loops) > + opts->x_flag_unroll_loops = value; > + if (!opts_set->x_flag_peel_loops) > + opts->x_flag_peel_loops = value; > + if (!opts_set->x_flag_tracer) > + opts->x_flag_tracer = value; > + if (!opts_set->x_flag_value_profile_transformations) > + opts->x_flag_value_profile_transformations = value; > + if (!opts_set->x_flag_inline_functions) > + opts->x_flag_inline_functions = value; > + if (!opts_set->x_flag_ipa_cp) > + opts->x_flag_ipa_cp = value; > + if (!opts_set->x_flag_ipa_cp_clone > + && value && opts->x_flag_ipa_cp) > + opts->x_flag_ipa_cp_clone = value; > + if (!opts_set->x_flag_predictive_commoning) > + opts->x_flag_predictive_commoning = value; > + if (!opts_set->x_flag_unswitch_loops) > + opts->x_flag_unswitch_loops = value; > + if (!opts_set->x_flag_gcse_after_reload) > + opts->x_flag_gcse_after_reload = value; > + if (!opts_set->x_flag_tree_loop_vectorize > + && !opts_set->x_flag_tree_vectorize) > + opts->x_flag_tree_loop_vectorize = value; > + if (!opts_set->x_flag_tree_slp_vectorize > + && !opts_set->x_flag_tree_vectorize) > + opts->x_flag_tree_slp_vectorize = value; > + if (!opts_set->x_flag_vect_cost_model) > + opts->x_flag_vect_cost_model = VECT_COST_MODEL_DYNAMIC; > + if (!opts_set->x_flag_tree_loop_distribute_patterns) > + opts->x_flag_tree_loop_distribute_patterns = value; > + if (!opts_set->x_flag_profile_reorder_functions) > + opts->x_flag_profile_reorder_functions = value; > + /* Indirect call profiling should do all useful transformations > + speculative devirtualization does. */ > + if (!opts_set->x_flag_devirtualize_speculatively > + && opts->x_flag_value_profile_transformations) > + opts->x_flag_devirtualize_speculatively = false; Please move x_flag_devirtualize_speculatively and x_flag_profile_reorder_functions to -fprofile-use only. > +} > + > /* Handle target- and language-independent options. Return zero to > generate an "unknown option" message. Only options that need > extra handling need to be listed here; if you simply want > @@ -1746,50 +1797,23 @@ common_handle_option (struct gcc_options *opts, > value = true; > /* No break here - do -fprofile-use processing. */ > case OPT_fprofile_use: > - if (!opts_set->x_flag_branch_probabilities) > - opts->x_flag_branch_probabilities = value; > - if (!opts_set->x_flag_profile_values) > - opts->x_flag_profile_values = value; > - if (!opts_set->x_flag_unroll_loops) > - opts->x_flag_unroll_loops = value; > - if (!opts_set->x_flag_peel_loops) > - opts->x_flag_peel_loops = value; > - if (!opts_set->x_flag_tracer) > - opts->x_flag_tracer = value; > - if (!opts_set->x_flag_value_profile_transformations) > - opts->x_flag_value_profile_transformations = value; > - if (!opts_set->x_flag_inline_functions) > - opts->x_flag_inline_functions = value; > - if (!opts_set->x_flag_ipa_cp) > - opts->x_flag_ipa_cp = value; > - if (!opts_set->x_flag_ipa_cp_clone > - && value && opts->x_flag_ipa_cp) > - opts->x_flag_ipa_cp_clone = value; > - if (!opts_set->x_flag_predictive_commoning) > - opts->x_flag_predictive_commoning = value; > - if (!opts_set->x_flag_unswitch_loops) > - opts->x_flag_unswitch_loops = value; > - if (!opts_set->x_flag_gcse_after_reload) > - opts->x_flag_gcse_after_reload = value; > - if (!opts_set->x_flag_tree_loop_vectorize > - && !opts_set->x_flag_tree_vectorize) > - opts->x_flag_tree_loop_vectorize = value; > - if (!opts_set->x_flag_tree_slp_vectorize > - && !opts_set->x_flag_tree_vectorize) > - opts->x_flag_tree_slp_vectorize = value; > - if (!opts_set->x_flag_vect_cost_model) > - opts->x_flag_vect_cost_model = VECT_COST_MODEL_DYNAMIC; > - if (!opts_set->x_flag_tree_loop_distribute_patterns) > - opts->x_flag_tree_loop_distribute_patterns = value; > - if (!opts_set->x_flag_profile_reorder_functions) > - opts->x_flag_profile_reorder_functions = value; > - /* Indirect call profiling should do all useful transformations > - speculative devirtualization does. */ > - if (!opts_set->x_flag_devirtualize_speculatively > - && opts->x_flag_value_profile_transformations) > - opts->x_flag_devirtualize_speculatively = false; > + enable_fdo_optimizations (opts, opts_set, value); > break; > > + case OPT_fauto_profile_: > + opts->x_auto_profile_file = xstrdup (arg); > + opts->x_flag_auto_profile = true; > + value = true; > + /* No break here - do -fauto-profile processing. */ > + case OPT_fauto_profile: > + enable_fdo_optimizations (opts, opts_set, value); > + if (!opts_set->x_flag_profile_correction) > + opts->x_flag_profile_correction = value; > + maybe_set_param_value ( > + PARAM_EARLY_INLINER_MAX_ITERATIONS, 10, > + opts->x_param_values, opts_set->x_param_values); > + break; > + > case OPT_fprofile_generate_: > opts->x_profile_data_prefix = xstrdup (arg); > value = true; Patch is OK with these changes. Please also update news.html and perhaps even news on the main page? It is quite nice user visible change. Honza