> >> +/* Member functions for string_table.  */
> >> +
> >> +string_table *
> >> +string_table::create ()
> >
> > Why this is not a constructor?
> 
> We use static initializer because it's not suggested to put too much
> logic in constructor.

Why not? :)
> >> +    }
> >
> > The two hunks probably can be unified.
> > Why get_index_by_decl does not already use dwarf name and why do you need 
> > to consider both?
> 
> get_index_by_decl is actually a wrapper of get_index. It tolerates
> error when assembler name is not emitted in the debug binary (mostly
> for functions that are fully inlined). In this case, we will first try
> to find the index of the assembler name, if not found, then bfd name.
> If still not found, then we try to find name from its abstract
> location.

Yep, I am just somewhat concerned that you need to try different variations of 
the name.
I would expect the assembler name (minus the random seed mangling) to be 
sufficient.
> 
> >
> >> +  if (DECL_ABSTRACT_ORIGIN (decl))
> >> +    return get_function_instance_by_decl (lineno, DECL_ABSTRACT_ORIGIN 
> >> (decl));
> >
> > What really happens for ipa-cp function clones and for split functions?
> 
> Their name suffix will be stripped before matching names.

I see and profile merged, that seems resonable.
> > I am not sure it is a win to have the things represented as VPT histograms 
> > when you shoot
> > for quite special purpose problem.  But lets deal with these incrementally, 
> > too.
> > (in general I do not see why you try to share so much with the gcov based 
> > VTP - it seems
> > easier to do the speculation by hand)
> 
> Basically there are 2 problem:
> 
> * before annotation. I agree this is a special purpose problem
> * after annotation. This should be the same problem as VPT.

Yep, I think adding VPT histograms to get devirtualization happen 
inter-procedurally
with LTO is a good idea.
You will need (incrementally I guess) to translate your indexes into the 
profile-id used
or you will need to initialize profile-ids accordingly, so the IPA pass is able 
to identify
the target cross-module.
> >> +{
> >> +  if (gcov_open (auto_profile_file, 1) == 0)
> >> +    error ("Cannot open profile file %s.", auto_profile_file);
> >> +
> >> +  if (gcov_read_unsigned () != GCOV_DATA_MAGIC)
> >> +    error ("AutoFDO profile magic number does not mathch.");
> >> +
> >> +  /* Skip the version number.  */
> >> +  gcov_read_unsigned ();
> >> +
> >> +  /* Skip the empty integer.  */
> >> +  gcov_read_unsigned ();
> >
> > Perhaps we can just check the values?
> 
> What do you mean? Check the value against 0?

Yes, if you have version in there it seems to make sense to check it ;)
I guess it is there so you can add extra stuff into the file format that will 
make
it incompatible with current one, so probably we should reject all versions 
greater
than 0. (especially because the tool is off-tree)
> >
> > Does this have chance to work inter-module?
> 
> Yes, it works fine with LIPO.

Important (for me) is to make it work with LTO, because LIPO is apparently not 
landing to 5.0.
> 
> Yes, this could be done.
> 
> After a second thought, I think we actually need to expose einliner
> interface to autofdo (instead of doing vpt in early inliner). This is
> because we need to mark icall as "promoted", so that later vpt passes
> will not try to promote it again. Currently in AutoFDO, we use a
> self-contained way (use a "set" to mark all promoted stmts). If we do
> vpt in einline, then we will need some flag attached to gimple to
> indicate whether it's already promoted.

If we manage to remove the iteration loop that repeativly applies the inline 
plan, you only
need way to mark edge as promoted.  I guess you can just test the speculative 
flag on the edge
to skip ones you dealt with earlier.

Lets do that incrementally though.
> +
> +@item -fauto-profile
> +@itemx -fauto-profile=@var{path}
> +@opindex fauto-profile
> +Enable sampling based feedback directed optimizations, and optimizations
> +generally profitable only with profile feedback available.
> +
> +The following options are enabled: @code{-fbranch-probabilities}, 
> @code{-fvpt},
> +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, 
> @code{-ftree-vectorize},
> +@code{ftree-loop-distribute-patterns}

I wonder about loop peeling - we do not enale it by default because we do not 
believe
our static branch predictor can well identify loops that have low expected trip 
count
(except ones that can be fully inlined).
I am not sure auto-FDO is much better here - how reliable the info about number 
of iteratoins
is?

But this is again something that can be handled separately.

Documentation seems out of date though - it seems to enable also
-finline, -fipa-cp and cloning, predictive commoning, unswitching, 
gcse-after-reload,

You do not want to enable reorder-functions because autoFDO does not include 
the time profiler.
Also I think speculative devirtualization should not be disabled.
> +
> +If @var{path} is specified, GCC looks at the @var{path} to find
> +the profile feedback data files.
> +
> +In order to collect AutoFDO profile, you need to have:
> +
> +1. A linux system with linux perf support
> +2. An Intel processor with last branch record (LBR) support

Is it really a requirement or will it only make profiles more accurate?
> +
> +To collect the profile, first use linux perf to collect raw profile
> +(see @uref{https://perf.wiki.kernel.org/}).
> +
> +E.g.
> +perf record -e br_inst_retired:near_taken -b -o perf.data -- your_program

You need some @ stuff to mark the example.
> +
> +Then use create_gcov tool, which takes raw profile and unstripped binary to
> +generate AutoFDO profile that can be used by GCC.
> +(see @uref{https://github.com/google/autofdo}).
> +
> +E.g.
> +create_gcov --binary=your_program.unstripped --profile=perf.data 
> --gcov=profile.afdo

Same here.

I see the AutoFDO tool is under apache licence.  I wonder if we can not share 
it in tree
like we do for some other dependencies (asan) for easier use.

> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c        (revision 215826)
> +++ gcc/opts.c        (working copy)
> @@ -1279,6 +1279,57 @@ print_specific_help (unsigned int include_flags,
>                      opts->x_help_columns, opts, lang_mask);
>  }
>  
> +/* Enable FDO-related flags.  */
> +
> +static void
> +enable_fdo_optimizations (struct gcc_options *opts,
> +                       struct gcc_options *opts_set,
> +                       int value)
> +{
> +  if (!opts_set->x_flag_branch_probabilities)
> +    opts->x_flag_branch_probabilities = value;
> +  if (!opts_set->x_flag_profile_values)
> +    opts->x_flag_profile_values = value;
> +  if (!opts_set->x_flag_unroll_loops)
> +    opts->x_flag_unroll_loops = value;
> +  if (!opts_set->x_flag_peel_loops)
> +    opts->x_flag_peel_loops = value;
> +  if (!opts_set->x_flag_tracer)
> +    opts->x_flag_tracer = value;
> +  if (!opts_set->x_flag_value_profile_transformations)
> +    opts->x_flag_value_profile_transformations = value;
> +  if (!opts_set->x_flag_inline_functions)
> +    opts->x_flag_inline_functions = value;
> +  if (!opts_set->x_flag_ipa_cp)
> +    opts->x_flag_ipa_cp = value;
> +  if (!opts_set->x_flag_ipa_cp_clone
> +      && value && opts->x_flag_ipa_cp)
> +    opts->x_flag_ipa_cp_clone = value;
> +  if (!opts_set->x_flag_predictive_commoning)
> +    opts->x_flag_predictive_commoning = value;
> +  if (!opts_set->x_flag_unswitch_loops)
> +    opts->x_flag_unswitch_loops = value;
> +  if (!opts_set->x_flag_gcse_after_reload)
> +    opts->x_flag_gcse_after_reload = value;
> +  if (!opts_set->x_flag_tree_loop_vectorize
> +      && !opts_set->x_flag_tree_vectorize)
> +    opts->x_flag_tree_loop_vectorize = value;
> +  if (!opts_set->x_flag_tree_slp_vectorize
> +      && !opts_set->x_flag_tree_vectorize)
> +    opts->x_flag_tree_slp_vectorize = value;
> +  if (!opts_set->x_flag_vect_cost_model)
> +    opts->x_flag_vect_cost_model = VECT_COST_MODEL_DYNAMIC;
> +  if (!opts_set->x_flag_tree_loop_distribute_patterns)
> +    opts->x_flag_tree_loop_distribute_patterns = value;
> +  if (!opts_set->x_flag_profile_reorder_functions)
> +    opts->x_flag_profile_reorder_functions = value;
> +  /* Indirect call profiling should do all useful transformations
> +     speculative devirtualization does.  */
> +  if (!opts_set->x_flag_devirtualize_speculatively
> +      && opts->x_flag_value_profile_transformations)
> +    opts->x_flag_devirtualize_speculatively = false;
Please move x_flag_devirtualize_speculatively and 
x_flag_profile_reorder_functions
to -fprofile-use only.
> +}
> +
>  /* Handle target- and language-independent options.  Return zero to
>     generate an "unknown option" message.  Only options that need
>     extra handling need to be listed here; if you simply want
> @@ -1746,50 +1797,23 @@ common_handle_option (struct gcc_options *opts,
>        value = true;
>        /* No break here - do -fprofile-use processing. */
>      case OPT_fprofile_use:
> -      if (!opts_set->x_flag_branch_probabilities)
> -     opts->x_flag_branch_probabilities = value;
> -      if (!opts_set->x_flag_profile_values)
> -     opts->x_flag_profile_values = value;
> -      if (!opts_set->x_flag_unroll_loops)
> -     opts->x_flag_unroll_loops = value;
> -      if (!opts_set->x_flag_peel_loops)
> -     opts->x_flag_peel_loops = value;
> -      if (!opts_set->x_flag_tracer)
> -     opts->x_flag_tracer = value;
> -      if (!opts_set->x_flag_value_profile_transformations)
> -     opts->x_flag_value_profile_transformations = value;
> -      if (!opts_set->x_flag_inline_functions)
> -     opts->x_flag_inline_functions = value;
> -      if (!opts_set->x_flag_ipa_cp)
> -     opts->x_flag_ipa_cp = value;
> -      if (!opts_set->x_flag_ipa_cp_clone
> -       && value && opts->x_flag_ipa_cp)
> -     opts->x_flag_ipa_cp_clone = value;
> -      if (!opts_set->x_flag_predictive_commoning)
> -     opts->x_flag_predictive_commoning = value;
> -      if (!opts_set->x_flag_unswitch_loops)
> -     opts->x_flag_unswitch_loops = value;
> -      if (!opts_set->x_flag_gcse_after_reload)
> -     opts->x_flag_gcse_after_reload = value;
> -      if (!opts_set->x_flag_tree_loop_vectorize
> -          && !opts_set->x_flag_tree_vectorize)
> -     opts->x_flag_tree_loop_vectorize = value;
> -      if (!opts_set->x_flag_tree_slp_vectorize
> -          && !opts_set->x_flag_tree_vectorize)
> -     opts->x_flag_tree_slp_vectorize = value;
> -      if (!opts_set->x_flag_vect_cost_model)
> -     opts->x_flag_vect_cost_model = VECT_COST_MODEL_DYNAMIC;
> -      if (!opts_set->x_flag_tree_loop_distribute_patterns)
> -     opts->x_flag_tree_loop_distribute_patterns = value;
> -      if (!opts_set->x_flag_profile_reorder_functions)
> -     opts->x_flag_profile_reorder_functions = value;
> -      /* Indirect call profiling should do all useful transformations
> -      speculative devirtualization does.  */
> -      if (!opts_set->x_flag_devirtualize_speculatively
> -       && opts->x_flag_value_profile_transformations)
> -     opts->x_flag_devirtualize_speculatively = false;
> +      enable_fdo_optimizations (opts, opts_set, value);
>        break;
>  
> +    case OPT_fauto_profile_:
> +      opts->x_auto_profile_file = xstrdup (arg);
> +      opts->x_flag_auto_profile = true;
> +      value = true;
> +      /* No break here - do -fauto-profile processing. */
> +    case OPT_fauto_profile:
> +      enable_fdo_optimizations (opts, opts_set, value);
> +      if (!opts_set->x_flag_profile_correction)
> +     opts->x_flag_profile_correction = value;
> +      maybe_set_param_value (
> +     PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
> +     opts->x_param_values, opts_set->x_param_values);
> +      break;
> +
>      case OPT_fprofile_generate_:
>        opts->x_profile_data_prefix = xstrdup (arg);
>        value = true;

Patch is OK with these changes. Please also update news.html and perhaps even
news on the main page? It is quite nice user visible change.

Honza

Reply via email to