On Wed, Aug 21, 2013 at 6:47 AM, Teresa Johnson <tejohn...@google.com> wrote: >> +/* Knob to control hot-caller heuristic. 0 means it is turned off, 1 means >> + it is always applied, and 2 means it is applied only if the footprint is >> + smaller than PARAM_HOT_CALLER_CODESIZE_THRESHOLD. */ >> DEFPARAM (PARAM_INLINE_HOT_CALLER, >> "inline-hot-caller", >> "Consider cold callsites for inlining if caller contains hot code", >> + 2, 0, 2) >> + >> +/* The maximum code size estimate under which hot caller heuristic is >> + applied. */ >> +DEFPARAM(PARAM_HOT_CALLER_CODESIZE_THRESHOLD, >> + "hot-caller-codesize-threshold", >> + "Maximum profile-based code size footprint estimate for " >> + "hot caller heuristic ", >> + 10000, 0, 0) > > Out of curiousity, how sensitive is performance to the value of this > parameter? I.e. is there a clear cutoff for the codes that benefit > from disabling this inlining vs those that benefit from enabling it? > > Also, have you tried spec2006? I remember that the codesize of the gcc > benchmark was above the larger 15000 threshold I use for tuning down > unrolling/peeling, and I needed to refine my heuristics to identify > profitable loops to unroll/peel even in the case of large codesize. > I'm not sure if there are more benchmarks that will be above the > smaller 10K threshold. The cutoff works well for internal benchmarks, but there is a wide range to set the value. I ran spec2006 and gcc benchmark does well. However, there are two c++ benchmarks (which fall below the cutoff) that degrades if I exclude PASS_THRU. I'll investigate that later.
> >> + >> +DEFPARAM (PARAM_INLINE_USEFUL_COLD_CALLEE, >> + "inline-useful-cold-callee", >> + "Consider cold callsites for inlining if caller contains hot code", >> 1, 0, 1) > > The description of this param is wrong (it is the same as the > description of PARAM_INLINE_HOT_CALLER). It should probably be > something like > "Only consider cold callsites for inlining if analysis finds > optimization opportunities" Thanks. Fixed. > >> >> /* Limit of iterations of early inliner. This basically bounds number of >> Index: gcc/ipa-inline.c >> =================================================================== >> --- gcc/ipa-inline.c (revision 201768) >> +++ gcc/ipa-inline.c (working copy) >> @@ -528,12 +528,60 @@ big_speedup_p (struct cgraph_edge *e) >> return false; >> } >> >> +/* Returns true if callee of edge E is considered useful to inline >> + even if it is cold. A callee is considered useful if there is at >> + least one argument of pointer type that is not a pass-through. */ > > Can you expand this comment a bit to add why such arguments indicate > useful inlining? Added comments. - Easwaran > > Thanks, > Teresa > >> + >> +static inline bool >> +useful_cold_callee (struct cgraph_edge *e) >> +{ >> + gimple call = e->call_stmt; >> + int n, arg_num = gimple_call_num_args (call); >> + struct ipa_edge_args *args = IPA_EDGE_REF (e); >> + >> + for (n = 0; n < arg_num; n++) >> + { >> + tree arg = gimple_call_arg (call, n); >> + if (POINTER_TYPE_P (TREE_TYPE (arg))) >> + { >> + struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, n); >> + if (jfunc->type != IPA_JF_PASS_THROUGH) >> + return true; >> + } >> + } >> + return false; >> +} >> + >> +/* Returns true if hot caller heuristic should be used. */ >> + >> +static inline bool >> +enable_hot_caller_heuristic (void) >> +{ >> + >> + gcov_working_set_t *ws = NULL; >> + int size_threshold = PARAM_VALUE (PARAM_HOT_CALLER_CODESIZE_THRESHOLD); >> + int num_counters = 0; >> + int param_inline_hot_caller = PARAM_VALUE (PARAM_INLINE_HOT_CALLER); >> + >> + if (param_inline_hot_caller == 0) >> + return false; >> + else if (param_inline_hot_caller == 1) >> + return true; >> + >> + ws = find_working_set(PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE)); >> + if (!ws) >> + return false; >> + num_counters = ws->num_counters; >> + return num_counters <= size_threshold; >> + >> +} >> /* Returns true if an edge or its caller are hot enough to >> be considered for inlining. */ >> >> static bool >> edge_hot_enough_p (struct cgraph_edge *edge) >> { >> + static bool use_hot_caller_heuristic = enable_hot_caller_heuristic (); >> if (cgraph_maybe_hot_edge_p (edge)) >> return true; >> >> @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge) >> if (flag_auto_profile && edge->callee->count == 0 >> && edge->callee->max_bb_count > 0) >> return false; >> - if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER) >> - && maybe_hot_count_p (NULL, edge->caller->max_bb_count)) >> - return true; >> + if (use_hot_caller_heuristic) >> + { >> + struct cgraph_node *where = edge->caller; >> + if (maybe_hot_count_p (NULL, where->max_bb_count)) >> + { >> + if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE)) >> + return useful_cold_callee (edge); >> + else >> + return true; >> + } >> + } >> return false; >> } > > On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman <era...@google.com> wrote: >> The current hot caller heuristic simply promotes edges whose caller is >> hot. This patch does the following: >> * Turn it off for applications with large footprint since the size >> increase hurts them >> * Be more selective by considering arguments to callee when the >> heuristic is enabled. >> >> This performs well on internal benchmarks. Ok for google/4_8 branch if >> all tests pass? >> >> - Easwaran > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413