Re: [Google] Refine hot caller heuristic

Teresa Johnson Wed, 21 Aug 2013 06:48:40 -0700

> +/* Knob to control hot-caller heuristic. 0 means it is turned off, 1 means
> +   it is always applied, and 2 means it is applied only if the footprint is
> +   smaller than PARAM_HOT_CALLER_CODESIZE_THRESHOLD.  */
>  DEFPARAM (PARAM_INLINE_HOT_CALLER,
>         "inline-hot-caller",
>         "Consider cold callsites for inlining if caller contains hot code",
> +       2, 0, 2)
> +
> +/* The maximum code size estimate under which hot caller heuristic is
> +   applied.  */
> +DEFPARAM(PARAM_HOT_CALLER_CODESIZE_THRESHOLD,
> +     "hot-caller-codesize-threshold",
> +     "Maximum profile-based code size footprint estimate for "
> +     "hot caller heuristic  ",
> +     10000, 0, 0)


Out of curiousity, how sensitive is performance to the value of this
parameter? I.e. is there a clear cutoff for the codes that benefit
from disabling this inlining vs those that benefit from enabling it?

Also, have you tried spec2006? I remember that the codesize of the gcc
benchmark was above the larger 15000 threshold I use for tuning down
unrolling/peeling, and I needed to refine my heuristics to identify
profitable loops to unroll/peel even in the case of large codesize.
I'm not sure if there are more benchmarks that will be above the
smaller 10K threshold.

> +
> +DEFPARAM (PARAM_INLINE_USEFUL_COLD_CALLEE,
> +       "inline-useful-cold-callee",
> +       "Consider cold callsites for inlining if caller contains hot code",
>         1, 0, 1)

The description of this param is wrong (it is the same as the
description of PARAM_INLINE_HOT_CALLER). It should probably be
something like
"Only consider cold callsites for inlining if analysis finds
optimization opportunities"

>
>  /* Limit of iterations of early inliner.  This basically bounds number of
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c  (revision 201768)
> +++ gcc/ipa-inline.c  (working copy)
> @@ -528,12 +528,60 @@ big_speedup_p (struct cgraph_edge *e)
>    return false;
>  }
>
> +/* Returns true if callee of edge E is considered useful to inline
> +   even if it is cold. A callee is considered useful if there is at
> +   least one argument of pointer type that is not a pass-through.  */

Can you expand this comment a bit to add why such arguments indicate
useful inlining?

Thanks,
Teresa

> +
> +static inline bool
> +useful_cold_callee (struct cgraph_edge *e)
> +{
> +  gimple call = e->call_stmt;
> +  int n, arg_num = gimple_call_num_args (call);
> +  struct ipa_edge_args *args = IPA_EDGE_REF (e);
> +
> +  for (n = 0; n < arg_num; n++)
> +    {
> +      tree arg = gimple_call_arg (call, n);
> +      if (POINTER_TYPE_P (TREE_TYPE (arg)))
> +        {
> +       struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, n);
> +       if (jfunc->type != IPA_JF_PASS_THROUGH)
> +            return true;
> +        }
> +    }
> +  return false;
> +}
> +
> +/* Returns true if hot caller heuristic should be used.  */
> +
> +static inline bool
> +enable_hot_caller_heuristic (void)
> +{
> +
> +  gcov_working_set_t *ws = NULL;
> +  int size_threshold = PARAM_VALUE (PARAM_HOT_CALLER_CODESIZE_THRESHOLD);
> +  int num_counters = 0;
> +  int param_inline_hot_caller = PARAM_VALUE (PARAM_INLINE_HOT_CALLER);
> +
> +  if (param_inline_hot_caller == 0)
> +    return false;
> +  else if (param_inline_hot_caller == 1)
> +    return true;
> +
> +  ws = find_working_set(PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE));
> +  if (!ws)
> +    return false;
> +  num_counters = ws->num_counters;
> +  return num_counters <= size_threshold;
> +
> +}
>  /* Returns true if an edge or its caller are hot enough to
>     be considered for inlining.  */
>
>  static bool
>  edge_hot_enough_p (struct cgraph_edge *edge)
>  {
> +  static bool use_hot_caller_heuristic = enable_hot_caller_heuristic ();
>    if (cgraph_maybe_hot_edge_p (edge))
>      return true;
>
> @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge)
>    if (flag_auto_profile && edge->callee->count == 0
>        && edge->callee->max_bb_count > 0)
>      return false;
> -  if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER)
> -      && maybe_hot_count_p (NULL, edge->caller->max_bb_count))
> -    return true;
> +  if (use_hot_caller_heuristic)
> +    {
> +      struct cgraph_node *where = edge->caller;
> +      if (maybe_hot_count_p (NULL, where->max_bb_count))
> +        {
> +          if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE))
> +            return useful_cold_callee (edge);
> +          else
> +            return true;
> +        }
> +    }
>    return false;
>  }

On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman <era...@google.com> wrote:
> The current hot caller heuristic simply promotes edges whose caller is
> hot. This patch does the following:
> * Turn it off for applications with large footprint since the size
> increase hurts them
> * Be more selective by considering arguments to callee when the
> heuristic is enabled.
>
> This performs well on internal benchmarks. Ok for google/4_8 branch if
> all tests pass?
>
> - Easwaran



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [Google] Refine hot caller heuristic

Reply via email to