On 6/18/19 11:02 AM, luoxhu wrote:
> Hi,
> 
> On 2019/6/18 13:51, Martin Liška wrote:
>> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>>
>> Hello.
>>
>> Thank you for the interest in the area.
>>
>>> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
>>> Currently the default instrument function can only find the indirect 
>>> function
>>> that called more than 50% with an incorrect count number returned.
>> Can you please explain what you mean by 'an incorrect count number returned'?
> 
> For a test case indir-call-topn.c, it include 2 indirect calls "one" and 
> "two". the profiling data is as below with trunk code (including your patch, 
> count[0] and count[2] is switched by your code, the count[0] is used in 
> ipa-profile but only support the top1 format, my patch adds the support for 
> the topn format. count[0] was incorrect as WITHOUT your patch it is 0,  
> things getting better with your fix as the count[0] is 350000000, but still 
> not correct, in fact, "one" is running 175000000 times, and "two" is running 
> the other 175000000 times):
> 
> indir-call-topn.gcda:   22:    01a90000:  18:COUNTERS indirect_call 9 counts
> indir-call-topn.gcda:   24:                   0: *350000000 1868707024 0* 0 0 
> 0 0 0
> 
> Running with the "--param indir-call-topn-profile=1" will give below profile 
> data, My patch is based on this profile result and do the optimization for 
> multiple indirect targets, performance can get much improve on this testcase 
> and SPEC2017 for some benchmarks(LLVM already support this several years 
> ago...).
> 
> indir-call-topn.gcda:   26:    01b10000:  18:COUNTERS indirect_call_topn 9 
> counts
> indir-call-topn.gcda:   28:                   0: *0 969338501 175000000 
> 1868707024 175000000* 0 0 0
> 
> 
> test case indir-call-topn.c:
> 
> #include <stdio.h>
> 
> 
> typedef int (*fptr) (int);
> int
> one (int a)
> {
>   return 1;
> }
> 
> int
> two (int a)
> {
>   return 0;
> }
> 
> fptr table[] = {&one, &two};
> 
> int
> main()
> {
>   int i, x;
>   fptr p = &one;
> 
>   one (3);
> 
>   for (i = 0; i < 350000000; i++)
>     {
>       x = (*p) (3);
>       p = table[x];
>     }
>   printf ("done:%d\n", x);
> }

I've got it. So it's situation where you have distribution equal to 50% and 
50%. Note that it's
the only valid situation where both edges with be >= 50%. That's the threshold 
for which
we speculatively devirtualize edges. That said, you don't need generic topn 
counter, but a probably
only a top2 counter which can be generalized from single-value counter type. 
I'm saying that
because I removed the TOPN, mainly due to:
https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179

which is over-complicated profiling function. And the changes that I've done 
recently are motivated
to preserve a stable builds. That's achieved by noticing that a single-value 
counter can't handle all
seen values.

> 
>>
>>>   This patch
>>> leverages the "--param indir-call-topn-profile=1" and enables multiple 
>>> indirect
>> Note that I've remove indir-call-topn-profile last week, the patch will not 
>> apply
>> on current trunk. However, I can help you how to adapt single-value counters
>> to support tracking of multiple values.
> 
> It will be very useful if you help me to track multiple values similarly on 
> trunk code. I will rebase to your code once topn is ready again. Actually 
> topn is more general and top1 is included in, I thought that top1 should be 
> removed instead of topn, though topn will consume longer time than top1 in 
> profile-generate.

As mentioned earlier, I really don't want to put TOPN back. I can help you once 
Honza will agree with the general IPA changes.

> 
>>
>>> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, 
>>> function
>>> specialization, profiling, partial devirtualization, inlining and cloning 
>>> could
>>> be done successfully based on it.
>> This decision is definitely big question for Honza?
>>
>>> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
>>> Details are:
>>>    1.  When do PGO with indir-call-topn-profile, the gcda data format is not
>>>    supported in ipa-profile pass,
>> If you take a look at gcc/ipa-profile.c:195 you can see how the probability
>> is propagated to IPA passes. Why is that not sufficient?
> 
> Current code only support single indirect target, I need track multiple 
> indirect targets and create multiple speculative edges on single indirect 
> call statement.
> 
> What's more, many ICEs happened in later stage due to single speculative 
> target design, part of this patch is to solve the ICEs of multiple 
> speculative target edges handling.

Well, to be honest I don't like the patch much. It brings another level of 
complexity for a quite rare situation where one
calls 2 functions via an indirect call. And as mentioned, current IPA 
optimization are not happy about multiple indirect branches.

Martin

> 
> 
> Thanks
> 
> Xionghu
> 
>>
>> Martin
>>
>>> so add variables to pass the information
>>>    through passes, and postpone gimple_ic to ipa-profile like default as 
>>> inline
>>>    pass will decide whether it is benefit to transform indirect call.
>>>    2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis 
>>> for
>>>    profile full support in ipa passes and cgraph_edge functions.
>>>    3.  Fix various hidden speculative call ICEs exposed after enabling this
>>>    feature when running SPEC2017.
>>>    4.  Add 1 in module testcase and 2 cross module testcases.
>>>    5.  TODOs:
>>>      5.1.  Some reference info will be dropped from WPA to LTRANS, so
>>>      reference check will be difficult in LTRANS, need replace the strstr
>>>      with reference compare.
>>>      5.2.  Some duplicate code need be removed as top1 and topn share same 
>>> logic.
>>>      Actually top1 related logic could be eliminated totally as topn 
>>> includes it.
>>>      5.3.  Split patch maybe needed as too big but not sure how many would 
>>> be
>>>      reasonable.
>>>    6.  Performance result for ppc64le:
>>>      6.1.  Representative test: indir-call-prof-topn.c runtime improved from
>>>      1.7s to 0.4s.
>>>      6.2.  SPEC2017 peakrate:
>>>          523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
>>> (+13.33%);
>>>          525.x264_r (-5.29%).
>>>          No big changes of other benchmarks.
>>>          Option: -Ofast -mcpu=power8
>>>          PASS1_OPTIMIZE: -fprofile-generate --param 
>>> indir-call-topn-profile=1 -flto
>>>          PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 
>>> -flto
>>>          -fprofile-correction
>>>      6.3.  No performance change on PHP benchmark.
>>>    7.  Bootstrap and regression test passed on Power8-LE.
>>>
>>> gcc/ChangeLog
>>>
>>>     2019-06-17  Xiong Hu Luo  <luo...@linux.ibm.com>
>>>
>>>     PR ipa/69678
>>>     * cgraph.c (cgraph_node::get_create): Copy profile_id.
>>>     (cgraph_edge::speculative_call_info): Find real
>>>     reference for indirect targets.
>>>     (cgraph_edge::resolve_speculation): Add speculative code process
>>>     for indirect targets.
>>>     (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>>>     (cgraph_node::verify_node): Likewise.
>>>     * cgraph.h (common_target_ids): New variable.
>>>     (common_target_probabilities): Likewise.
>>>     (num_of_ics): Likewise.
>>>     * cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
>>>     * ipa-inline.c (inline_small_functions): Add iterator update.
>>>     * ipa-profile.c (ipa_profile_generate_summary): Add indirect
>>>     multiple targets logic.
>>>     (ipa_profile): Likewise.
>>>     * ipa-utils.c (ipa_merge_profiles): Clone speculative src's
>>>     referrings to dst.
>>>     * ipa.c (process_references): Fix typo.
>>>     * lto-cgraph.c (lto_output_edge): Add indirect multiple targets
>>>     logic.
>>>     (input_edge): Likewise.
>>>     * predict.c (dump_prediction): Revome edges count assert to be
>>>     precise.
>>>     * tree-profile.c (gimple_gen_ic_profiler): Use the new variable
>>>     __gcov_indirect_call.counters and __gcov_indirect_call.callee.
>>>     (gimple_gen_ic_func_profiler): Likewise.
>>>     (pass_ipa_tree_profile::gate): Fix comment typos.
>>>     * tree-inline.c (copy_bb): Duplicate all the speculative edges
>>>     if indirect call contains multiple speculative targets.
>>>     * value-prof.c (check_counter): Proportion the counter for
>>>     multiple targets.
>>>     (ic_transform_topn): New function.
>>>     (gimple_ic_transform): Handle topn case, fix comment typos.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>>     2019-06-17  Xiong Hu Luo  <luo...@linux.ibm.com>
>>>
>>>     PR ipa/69678
>>>     * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
>>> ---
>>>   gcc/cgraph.c                                  |  38 +++-
>>>   gcc/cgraph.h                                  |   9 +-
>>>   gcc/cgraphclones.c                            |   1 +
>>>   gcc/ipa-inline.c                              |   3 +
>>>   gcc/ipa-profile.c                             | 185 +++++++++++++++++-
>>>   gcc/ipa-utils.c                               |   5 +
>>>   gcc/ipa.c                                     |   2 +-
>>>   gcc/lto-cgraph.c                              |  38 ++++
>>>   gcc/predict.c                                 |   1 -
>>>   .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++
>>>   .../crossmodule-indir-call-topn-1a.c          |  22 +++
>>>   .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++
>>>   .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++
>>>   gcc/tree-inline.c                             |  97 +++++----
>>>   gcc/tree-profile.c                            |  12 +-
>>>   gcc/value-prof.c                              | 146 +++++++++++++-
>>>   16 files changed, 606 insertions(+), 68 deletions(-)
>>>   create mode 100644 
>>> gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>>   create mode 100644 
>>> gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>>   create mode 100644 
>>> gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>>
>>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>>> index de82316d4b1..0d373a67d1b 100644
>>> --- a/gcc/cgraph.c
>>> +++ b/gcc/cgraph.c
>>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>>>       fprintf (dump_file, "Introduced new external node "
>>>            "(%s) and turned into root of the clone tree.\n",
>>>            node->dump_name ());
>>> +      node->profile_id = first_clone->profile_id;
>>>       }
>>>     else if (dump_file)
>>>       fprintf (dump_file, "Introduced new external node "
>>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge 
>>> *&direct,
>>>     int i;
>>>     cgraph_edge *e2;
>>>     cgraph_edge *e = this;
>>> +  cgraph_node *referred_node;
>>>       if (!e->indirect_unknown_callee)
>>>       for (e2 = e->caller->indirect_calls;
>>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge 
>>> *&direct,
>>>       && ((ref->stmt && ref->stmt == e->call_stmt)
>>>           || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>>>         {
>>> -    reference = ref;
>>> -    break;
>>> +    if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +      {
>>> +        referred_node = dyn_cast<cgraph_node *> (ref->referred);
>>> +        if (strstr (e->callee->name (), referred_node->name ()))
>>> +          {
>>> +        reference = ref;
>>> +        break;
>>> +          }
>>> +      }
>>> +    else
>>> +      {
>>> +        reference = ref;
>>> +        break;
>>> +      }
>>>         }
>>>       /* Speculative edge always consist of all three components - direct 
>>> edge,
>>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>>>            in the functions inlined through it.  */
>>>       }
>>>     edge->count += e2->count;
>>> -  edge->speculative = false;
>>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
>>> +    {
>>> +      edge->indirect_info->num_of_ics--;
>>> +      if (edge->indirect_info->num_of_ics == 0)
>>> +    edge->speculative = false;
>>> +    }
>>> +  else
>>> +    edge->speculative = false;
>>>     e2->speculative = false;
>>>     ref->remove_reference ();
>>>     if (e2->indirect_unknown_callee || e2->inline_failed)
>>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>>         e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>>>                                false);
>>>         e->count = gimple_bb (e->call_stmt)->count;
>>> -      e2->speculative = false;
>>> +      if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +        {
>>> +          e2->indirect_info->num_of_ics--;
>>> +          if (e2->indirect_info->num_of_ics == 0)
>>> +        e2->speculative = false;
>>> +        }
>>> +      else
>>> +        e2->speculative = false;
>>>         e2->count = gimple_bb (e2->call_stmt)->count;
>>>         ref->speculative = false;
>>>         ref->stmt = NULL;
>>> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
>>>           for (e = callees; e; e = e->next_callee)
>>>       {
>>> -      if (!e->aux)
>>> +      if (!e->aux && !e->speculative)
>>>           {
>>>             error ("edge %s->%s has no corresponding call_stmt",
>>>                identifier_to_locale (e->caller->name ()),
>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>>> index c294602d762..ed0fbc60432 100644
>>> --- a/gcc/cgraph.h
>>> +++ b/gcc/cgraph.h
>>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>>   #include "profile-count.h"
>>>   #include "ipa-ref.h"
>>>   #include "plugin-api.h"
>>> +#include "gcov-io.h"
>>>     extern void debuginfo_early_init (void);
>>>   extern void debuginfo_init (void);
>>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>>>     int param_index;
>>>     /* ECF flags determined from the caller.  */
>>>     int ecf_flags;
>>> -  /* Profile_id of common target obtrained from profile.  */
>>> +  /* Profile_id of common target obtained from profile.  */
>>>     int common_target_id;
>>>     /* Probability that call will land in function with COMMON_TARGET_ID.  
>>> */
>>>     int common_target_probability;
>>>   +  /* Profile_id of common target obtained from profile.  */
>>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS. 
>>>  */
>>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>> +  unsigned num_of_ics;
>>> +
>>>     /* Set when the call is a virtual call with the parameter being the
>>>        associated object pointer rather than a simple direct call.  */
>>>     unsigned polymorphic : 1;
>>> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
>>> index 15f7e119d18..94f424bc10c 100644
>>> --- a/gcc/cgraphclones.c
>>> +++ b/gcc/cgraphclones.c
>>> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count 
>>> prof_count,
>>>     new_node->icf_merged = icf_merged;
>>>     new_node->merged_comdat = merged_comdat;
>>>     new_node->thunk = thunk;
>>> +  new_node->profile_id = profile_id;
>>>       new_node->clone.tree_map = NULL;
>>>     new_node->clone.args_to_skip = args_to_skip;
>>> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
>>> index 360c3de3289..ef2b217b3f9 100644
>>> --- a/gcc/ipa-inline.c
>>> +++ b/gcc/ipa-inline.c
>>> @@ -1866,12 +1866,15 @@ inline_small_functions (void)
>>>       }
>>>         if (has_speculative)
>>>       for (edge = node->callees; edge; edge = next)
>>> +    {
>>> +      next = edge->next_callee;
>>>         if (edge->speculative && !speculation_useful_p (edge,
>>>                                 edge->aux != NULL))
>>>           {
>>>             edge->resolve_speculation ();
>>>             update = true;
>>>           }
>>> +    }
>>>         if (update)
>>>       {
>>>         struct cgraph_node *where = node->global.inlined_to
>>> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
>>> index de9563d808c..d04476295a0 100644
>>> --- a/gcc/ipa-profile.c
>>> +++ b/gcc/ipa-profile.c
>>> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
>>>     struct cgraph_node *node;
>>>     gimple_stmt_iterator gsi;
>>>     basic_block bb;
>>> +  enum hist_type type;
>>> +
>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? 
>>> HIST_TYPE_INDIR_CALL_TOPN
>>> +                             : HIST_TYPE_INDIR_CALL;
>>>       hash_table<histogram_hash> hashtable (10);
>>>     @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
>>>             histogram_value h;
>>>             h = gimple_histogram_value_of_type
>>>               (DECL_STRUCT_FUNCTION (node->decl),
>>> -             stmt, HIST_TYPE_INDIR_CALL);
>>> +             stmt, type);
>>>             /* No need to do sanity check: gimple_ic_transform already
>>>                takes away bad histograms.  */
>>> -          if (h)
>>> +          if (h && type == HIST_TYPE_INDIR_CALL)
>>>               {
>>>                 /* counter 0 is target, counter 1 is number of execution we 
>>> called target,
>>>                counter 2 is total number of executions.  */
>>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>>>                 gimple_remove_histogram_value (DECL_STRUCT_FUNCTION 
>>> (node->decl),
>>>                                 stmt, h);
>>>               }
>>> +          else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
>>> +            {
>>> +              unsigned j;
>>> +              struct cgraph_edge *e = node->get_edge (stmt);
>>> +              if (e && !e->indirect_unknown_callee)
>>> +            continue;
>>> +
>>> +              e->indirect_info->num_of_ics = 0;
>>> +              for (j = 1; j < h->n_counters; j += 2)
>>> +            {
>>> +              if (h->hvalue.counters[j] == 0)
>>> +                continue;
>>> +
>>> +              e->indirect_info->common_target_ids[j / 2]
>>> +                = h->hvalue.counters[j];
>>> +              e->indirect_info->common_target_probabilities[j / 2]
>>> +                = GCOV_COMPUTE_SCALE (
>>> +                  h->hvalue.counters[j + 1],
>>> +                  gimple_bb (stmt)->count.ipa ().to_gcov_type ());
>>> +              if (e->indirect_info
>>> +                ->common_target_probabilities[j / 2]
>>> +                  > REG_BR_PROB_BASE)
>>> +                {
>>> +                  if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Probability capped to 1\n");
>>> +                  e->indirect_info
>>> +                ->common_target_probabilities[j / 2]
>>> +                = REG_BR_PROB_BASE;
>>> +                }
>>> +              e->indirect_info->num_of_ics++;
>>> +            }
>>> +
>>> +              gcc_assert (e->indirect_info->num_of_ics
>>> +                  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>> +
>>> +              gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
>>> +                               node->decl),
>>> +                             stmt, h);
>>> +            }
>>>           }
>>>             time += estimate_num_insns (stmt, &eni_time_weights);
>>>             size += estimate_num_insns (stmt, &eni_size_weights);
>>> @@ -492,6 +536,7 @@ ipa_profile (void)
>>>     int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted 
>>> = 0;
>>>     int nmismatch = 0, nimpossible = 0;
>>>     bool node_map_initialized = false;
>>> +  gcov_type threshold;
>>>       if (dump_file)
>>>       dump_histogram (dump_file, histogram);
>>> @@ -500,14 +545,12 @@ ipa_profile (void)
>>>         overall_time += histogram[i]->count * histogram[i]->time;
>>>         overall_size += histogram[i]->size;
>>>       }
>>> +  threshold = 0;
>>>     if (overall_time)
>>>       {
>>> -      gcov_type threshold;
>>> -
>>>         gcc_assert (overall_size);
>>>           cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 
>>> 500) / 1000;
>>> -      threshold = 0;
>>>         for (i = 0; cumulated < cutoff; i++)
>>>       {
>>>         cumulated += histogram[i]->count * histogram[i]->time;
>>> @@ -543,7 +586,7 @@ ipa_profile (void)
>>>     histogram.release ();
>>>     histogram_pool.release ();
>>>   -  /* Produce speculative calls: we saved common traget from porfiling 
>>> into
>>> +  /* Produce speculative calls: we saved common target from profiling into
>>>        e->common_target_id.  Now, at link time, we can look up corresponding
>>>        function node and produce speculative call.  */
>>>   @@ -558,7 +601,8 @@ ipa_profile (void)
>>>       {
>>>         if (n->count.initialized_p ())
>>>           nindirect++;
>>> -      if (e->indirect_info->common_target_id)
>>> +      if (e->indirect_info->common_target_id
>>> +          || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>>>           {
>>>             if (!node_map_initialized)
>>>               init_node_map (false);
>>> @@ -613,7 +657,7 @@ ipa_profile (void)
>>>                 if (dump_file)
>>>               fprintf (dump_file,
>>>                    "Not speculating: "
>>> -                 "parameter count mistmatch\n");
>>> +                 "parameter count mismatch\n");
>>>               }
>>>             else if (e->indirect_info->polymorphic
>>>                  && !opt_for_fn (n->decl, flag_devirtualize)
>>> @@ -655,7 +699,130 @@ ipa_profile (void)
>>>             nunknown++;
>>>           }
>>>           }
>>> -     }
>>> +      if (e->indirect_info && e->indirect_info->num_of_ics > 1)
>>> +        {
>>> +          if (in_lto_p)
>>> +        {
>>> +          if (dump_file)
>>> +            {
>>> +              fprintf (dump_file,
>>> +                   "Updating hotness threshold in LTO mode.\n");
>>> +              fprintf (dump_file, "Updated min count: %" PRId64 "\n",
>>> +                   (int64_t) threshold);
>>> +            }
>>> +          set_hot_bb_threshold (threshold
>>> +                    / e->indirect_info->num_of_ics);
>>> +        }
>>> +          if (!node_map_initialized)
>>> +        init_node_map (false);
>>> +          node_map_initialized = true;
>>> +          ncommon++;
>>> +          unsigned speculative = 0;
>>> +          for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          n2 = find_func_by_profile_id (
>>> +            e->indirect_info->common_target_ids[i]);
>>> +          if (n2)
>>> +            {
>>> +              if (dump_file)
>>> +            {
>>> +              fprintf (
>>> +                dump_file,
>>> +                "Indirect call -> direct call from"
>>> +                " other module %s => %s, prob %3.2f\n",
>>> +                n->dump_name (), n2->dump_name (),
>>> +                e->indirect_info->common_target_probabilities[i]
>>> +                  / (float) REG_BR_PROB_BASE);
>>> +            }
>>> +              if (e->indirect_info->common_target_probabilities[i]
>>> +              < REG_BR_PROB_BASE / 2)
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (
>>> +                  dump_file,
>>> +                  "Not speculating: probability is too low.\n");
>>> +            }
>>> +              else if (!e->maybe_hot_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: call is cold.\n");
>>> +            }
>>> +              else if (n2->get_availability () <= AVAIL_INTERPOSABLE
>>> +                   && n2->can_be_discarded_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: target is overwritable "
>>> +                     "and can be discarded.\n");
>>> +            }
>>> +              else if (ipa_node_params_sum && ipa_edge_args_sum
>>> +                   && (!vec_safe_is_empty (
>>> +                 IPA_NODE_REF (n2)->descriptors))
>>> +                   && ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                    != ipa_get_cs_argument_count (
>>> +                      IPA_EDGE_REF (e))
>>> +                   && (ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                     >= ipa_get_cs_argument_count (
>>> +                       IPA_EDGE_REF (e))
>>> +                   || !stdarg_p (TREE_TYPE (n2->decl))))
>>> +            {
>>> +              nmismatch++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file, "Not speculating: "
>>> +                        "parameter count mismatch\n");
>>> +            }
>>> +              else if (e->indirect_info->polymorphic
>>> +                   && !opt_for_fn (n->decl, flag_devirtualize)
>>> +                   && !possible_polymorphic_call_target_p (e, n2))
>>> +            {
>>> +              nimpossible++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: "
>>> +                     "function is not in the polymorphic "
>>> +                     "call target list\n");
>>> +            }
>>> +              else
>>> +            {
>>> +              /* Target may be overwritable, but profile says that
>>> +                 control flow goes to this particular implementation
>>> +                 of N2.  Speculate on the local alias to allow
>>> +                 inlining.
>>> +                 */
>>> +              if (!n2->can_be_discarded_p ())
>>> +                {
>>> +                  cgraph_node *alias;
>>> +                  alias = dyn_cast<cgraph_node *> (
>>> +                n2->noninterposable_alias ());
>>> +                  if (alias)
>>> +                n2 = alias;
>>> +                }
>>> +              nconverted++;
>>> +              e->make_speculative (
>>> +                n2, e->count.apply_probability (
>>> +                  e->indirect_info
>>> +                    ->common_target_probabilities[i]));
>>> +              update = true;
>>> +              speculative++;
>>> +            }
>>> +            }
>>> +          else
>>> +            {
>>> +              if (dump_file)
>>> +            fprintf (dump_file,
>>> +                 "Function with profile-id %i not found.\n",
>>> +                 e->indirect_info->common_target_ids[i]);
>>> +              nunknown++;
>>> +            }
>>> +        }
>>> +          if (speculative < e->indirect_info->num_of_ics)
>>> +        e->indirect_info->num_of_ics = speculative;
>>> +        }
>>> +    }
>>>          if (update)
>>>        ipa_update_overall_fn_summary (n);
>>>        }
>>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>>> index 79b250c3943..30347691029 100644
>>> --- a/gcc/ipa-utils.c
>>> +++ b/gcc/ipa-utils.c
>>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>>         update_max_bb_count ();
>>>         compute_function_frequency ();
>>>         pop_cfun ();
>>> +      /* When src is speculative, clone the referrings.  */
>>> +      if (src->indirect_call_target)
>>> +    for (e = src->callers; e; e = e->next_caller)
>>> +      if (e->callee == src && e->speculative)
>>> +        dst->clone_referring (src);
>>>         for (e = dst->callees; e; e = e->next_callee)
>>>       {
>>>         if (e->speculative)
>>> diff --git a/gcc/ipa.c b/gcc/ipa.c
>>> index 2496694124c..c1fe081a72d 100644
>>> --- a/gcc/ipa.c
>>> +++ b/gcc/ipa.c
>>> @@ -166,7 +166,7 @@ process_references (symtab_node *snode,
>>>      devirtualization happens.  After inlining still keep their declarations
>>>      around, so we can devirtualize to a direct call.
>>>   -   Also try to make trivial devirutalization when no or only one target 
>>> is
>>> +   Also try to make trivial devirtualization when no or only one target is
>>>      possible.  */
>>>     static void
>>> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
>>> index 4dfa2862be3..0c8f547d44e 100644
>>> --- a/gcc/lto-cgraph.c
>>> +++ b/gcc/lto-cgraph.c
>>> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, 
>>> struct cgraph_edge *edge,
>>>     unsigned int uid;
>>>     intptr_t ref;
>>>     struct bitpack_d bp;
>>> +  unsigned i;
>>>       if (edge->indirect_unknown_callee)
>>>       streamer_write_enum (ob->main_stream, LTO_symtab_tags, 
>>> LTO_symtab_last_tag,
>>> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, 
>>> struct cgraph_edge *edge,
>>>         if (edge->indirect_info->common_target_id)
>>>       streamer_write_hwi_stream
>>>          (ob->main_stream, edge->indirect_info->common_target_probability);
>>> +
>>> +      gcc_assert (edge->indirect_info->num_of_ics
>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>> +
>>> +      streamer_write_hwi_stream (ob->main_stream,
>>> +                 edge->indirect_info->num_of_ics);
>>> +
>>> +      if (edge->indirect_info->num_of_ics)
>>> +    {
>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          streamer_write_hwi_stream (
>>> +        ob->main_stream, edge->indirect_info->common_target_ids[i]);
>>> +          if (edge->indirect_info->common_target_ids[i])
>>> +        streamer_write_hwi_stream (
>>> +          ob->main_stream,
>>> +          edge->indirect_info->common_target_probabilities[i]);
>>> +        }
>>> +    }
>>>       }
>>>   }
>>>   @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, 
>>> vec<symtab_node *> nodes,
>>>     cgraph_inline_failed_t inline_failed;
>>>     struct bitpack_d bp;
>>>     int ecf_flags = 0;
>>> +  unsigned i;
>>>       caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
>>>     if (caller == NULL || caller->decl == NULL_TREE)
>>> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, 
>>> vec<symtab_node *> nodes,
>>>         edge->indirect_info->common_target_id = streamer_read_hwi (ib);
>>>         if (edge->indirect_info->common_target_id)
>>>           edge->indirect_info->common_target_probability = 
>>> streamer_read_hwi (ib);
>>> +
>>> +      edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
>>> +
>>> +      gcc_assert (edge->indirect_info->num_of_ics
>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>> +
>>> +      if (edge->indirect_info->num_of_ics)
>>> +    {
>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          edge->indirect_info->common_target_ids[i]
>>> +        = streamer_read_hwi (ib);
>>> +          if (edge->indirect_info->common_target_ids[i])
>>> +        edge->indirect_info->common_target_probabilities[i]
>>> +          = streamer_read_hwi (ib);
>>> +        }
>>> +    }
>>>       }
>>>   }
>>>   diff --git a/gcc/predict.c b/gcc/predict.c
>>> index 43ee91a5b13..b7f38891c72 100644
>>> --- a/gcc/predict.c
>>> +++ b/gcc/predict.c
>>> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor 
>>> predictor, int probability,
>>>         && bb->count.precise_p ()
>>>         && reason == REASON_NONE)
>>>       {
>>> -      gcc_assert (e->count ().precise_p ());
>>>         fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
>>>              predictor_info[predictor].name,
>>>              bb->count.to_gcov_type (), e->count ().to_gcov_type (),
>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c 
>>> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>> new file mode 100644
>>> index 00000000000..e0a83c2e067
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>> @@ -0,0 +1,35 @@
>>> +/* { dg-require-effective-target lto } */
>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param 
>>> indir-call-topn-profile=1" } */
>>> +
>>> +#include <stdio.h>
>>> +
>>> +typedef int (*fptr) (int);
>>> +int
>>> +one (int a);
>>> +
>>> +int
>>> +two (int a);
>>> +
>>> +fptr table[] = {&one, &two};
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  int i, x;
>>> +  fptr p = &one;
>>> +
>>> +  x = one (3);
>>> +
>>> +  for (i = 0; i < 350000000; i++)
>>> +    {
>>> +      x = (*p) (3);
>>> +      p = table[x];
>>> +    }
>>> +  printf ("done:%d\n", x);
>>> +}
>>> +
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
>>> call.* one transformation on insn" "profile_estimate" } } */
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
>>> call.* two transformation on insn" "profile_estimate" } } */
>>> +
>>> diff --git 
>>> a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c 
>>> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>> new file mode 100644
>>> index 00000000000..a8c6e365fb9
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>> @@ -0,0 +1,22 @@
>>> +/* It seems there is no way to avoid the other source of mulitple
>>> +   source testcase from being compiled independently.  Just avoid
>>> +   error.  */
>>> +#ifdef DOJOB
>>> +int
>>> +one (int a)
>>> +{
>>> +  return 1;
>>> +}
>>> +
>>> +int
>>> +two (int a)
>>> +{
>>> +  return 0;
>>> +}
>>> +#else
>>> +int
>>> +main()
>>> +{
>>> +  return 0;
>>> +}
>>> +#endif
>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c 
>>> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>> new file mode 100644
>>> index 00000000000..aa3887fde83
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>> @@ -0,0 +1,42 @@
>>> +/* { dg-require-effective-target lto } */
>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param 
>>> indir-call-topn-profile=1" } */
>>> +
>>> +#include <stdio.h>
>>> +
>>> +typedef int (*fptr) (int);
>>> +int
>>> +one (int a);
>>> +
>>> +int
>>> +two (int a);
>>> +
>>> +fptr table[] = {&one, &two};
>>> +
>>> +int foo ()
>>> +{
>>> +  int i, x;
>>> +  fptr p = &one;
>>> +
>>> +  x = one (3);
>>> +
>>> +  for (i = 0; i < 350000000; i++)
>>> +    {
>>> +      x = (*p) (3);
>>> +      p = table[x];
>>> +    }
>>> +  return x;
>>> +}
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  int x = foo ();
>>> +  printf ("done:%d\n", x);
>>> +}
>>> +
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
>>> call.* one transformation on insn" "profile_estimate" } } */
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct 
>>> call.* two transformation on insn" "profile_estimate" } } */
>>> +
>>> +
>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c 
>>> b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>> new file mode 100644
>>> index 00000000000..951bc7ddd19
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>> @@ -0,0 +1,38 @@
>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" 
>>> } */
>>> +
>>> +#include <stdio.h>
>>> +
>>> +typedef int (*fptr) (int);
>>> +int
>>> +one (int a)
>>> +{
>>> +  return 1;
>>> +}
>>> +
>>> +int
>>> +two (int a)
>>> +{
>>> +  return 0;
>>> +}
>>> +
>>> +fptr table[] = {&one, &two};
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  int i, x;
>>> +  fptr p = &one;
>>> +
>>> +  one (3);
>>> +
>>> +  for (i = 0; i < 350000000; i++)
>>> +    {
>>> +      x = (*p) (3);
>>> +      p = table[x];
>>> +    }
>>> +  printf ("done:%d\n", x);
>>> +}
>>> +
>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct 
>>> call.* one transformation on insn" "profile" } } */
>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct 
>>> call.* two transformation on insn" "profile" } } */
>>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>>> index 9017da878b1..f69b31b197e 100644
>>> --- a/gcc/tree-inline.c
>>> +++ b/gcc/tree-inline.c
>>> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
>>>             switch (id->transform_call_graph_edges)
>>>           {
>>>           case CB_CGE_DUPLICATE:
>>> -          edge = id->src_node->get_edge (orig_stmt);
>>> -          if (edge)
>>> -            {
>>> -              struct cgraph_edge *old_edge = edge;
>>> -              profile_count old_cnt = edge->count;
>>> -              edge = edge->clone (id->dst_node, call_stmt,
>>> -                      gimple_uid (stmt),
>>> -                      num, den,
>>> -                      true);
>>> -
>>> -              /* Speculative calls consist of two edges - direct and
>>> -             indirect.  Duplicate the whole thing and distribute
>>> -             frequencies accordingly.  */
>>> -              if (edge->speculative)
>>> -            {
>>> -              struct cgraph_edge *direct, *indirect;
>>> -              struct ipa_ref *ref;
>>> -
>>> -              gcc_assert (!edge->indirect_unknown_callee);
>>> -              old_edge->speculative_call_info (direct, indirect, ref);
>>> -
>>> -              profile_count indir_cnt = indirect->count;
>>> -              indirect = indirect->clone (id->dst_node, call_stmt,
>>> -                              gimple_uid (stmt),
>>> -                              num, den,
>>> -                              true);
>>> -
>>> -              profile_probability prob
>>> -                 = indir_cnt.probability_in (old_cnt + indir_cnt);
>>> -              indirect->count
>>> -                 = copy_basic_block->count.apply_probability (prob);
>>> -              edge->count = copy_basic_block->count - indirect->count;
>>> -              id->dst_node->clone_reference (ref, stmt);
>>> -            }
>>> -              else
>>> -            edge->count = copy_basic_block->count;
>>> -            }
>>> +          {
>>> +            edge = id->src_node->get_edge (orig_stmt);
>>> +            struct cgraph_edge *old_edge = edge;
>>> +            struct cgraph_edge *direct, *indirect;
>>> +            bool next_speculative;
>>> +            do
>>> +              {
>>> +            next_speculative = false;
>>> +            if (edge)
>>> +              {
>>> +                profile_count old_cnt = edge->count;
>>> +                edge
>>> +                  = edge->clone (id->dst_node, call_stmt,
>>> +                         gimple_uid (stmt), num, den, true);
>>> +
>>> +                /* Speculative calls consist of two edges - direct
>>> +                   and indirect.  Duplicate the whole thing and
>>> +                   distribute frequencies accordingly.  */
>>> +                if (edge->speculative)
>>> +                  {
>>> +                struct ipa_ref *ref;
>>> +
>>> +                gcc_assert (!edge->indirect_unknown_callee);
>>> +                old_edge->speculative_call_info (direct,
>>> +                                 indirect, ref);
>>> +
>>> +                profile_count indir_cnt = indirect->count;
>>> +                indirect
>>> +                  = indirect->clone (id->dst_node, call_stmt,
>>> +                             gimple_uid (stmt), num,
>>> +                             den, true);
>>> +
>>> +                profile_probability prob
>>> +                  = indir_cnt.probability_in (old_cnt
>>> +                                  + indir_cnt);
>>> +                indirect->count
>>> +                  = copy_basic_block->count.apply_probability (
>>> +                    prob);
>>> +                edge->count
>>> +                  = copy_basic_block->count - indirect->count;
>>> +                id->dst_node->clone_reference (ref, stmt);
>>> +                  }
>>> +                else
>>> +                  edge->count = copy_basic_block->count;
>>> +              }
>>> +            /* If the indirect call contains more than one indirect
>>> +               targets, need clone all speculative edges here.  */
>>> +            if (old_edge && old_edge->next_callee
>>> +                && old_edge->speculative && indirect
>>> +                && indirect->indirect_info
>>> +                && indirect->indirect_info->num_of_ics > 1)
>>> +              {
>>> +                edge = old_edge->next_callee;
>>> +                old_edge = old_edge->next_callee;
>>> +                if (edge->speculative)
>>> +                  next_speculative = true;
>>> +              }
>>> +              }
>>> +            while (next_speculative);
>>> +          }
>>>             break;
>>>             case CB_CGE_MOVE_CLONES:
>>> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
>>> index 1c3034aac10..4964dbdebb5 100644
>>> --- a/gcc/tree-profile.c
>>> +++ b/gcc/tree-profile.c
>>> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
>>>   /* Do initialization work for the edge profiler.  */
>>>     /* Add code:
>>> -   __thread gcov*    __gcov_indirect_call_counters; // pointer to actual 
>>> counter
>>> -   __thread void*    __gcov_indirect_call_callee; // actual callee address
>>> +   __thread gcov*    __gcov_indirect_call.counters; // pointer to actual 
>>> counter
>>> +   __thread void*    __gcov_indirect_call.callee; // actual callee address
>>>      __thread int __gcov_function_counter; // time profiler function counter
>>>   */
>>>   static void
>>> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned 
>>> tag, unsigned base)
>>>         f_1 = foo;
>>>         __gcov_indirect_call.counters = &__gcov4.main[0];
>>>         PROF_9 = f_1;
>>> -      __gcov_indirect_call_callee = PROF_9;
>>> +      __gcov_indirect_call.callee = PROF_9;
>>>         _4 = f_1 ();
>>>      */
>>>   @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
>>>       /* Insert code:
>>>   -     if (__gcov_indirect_call_callee != NULL)
>>> +     if (__gcov_indirect_call.callee != NULL)
>>>          __gcov_indirect_call_profiler_v3 (profile_id, 
>>> &current_function_decl);
>>>          The function __gcov_indirect_call_profiler_v3 is responsible for
>>> -     resetting __gcov_indirect_call_callee to NULL.  */
>>> +     resetting __gcov_indirect_call.callee to NULL.  */
>>>       gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
>>>     void0 = build_int_cst (ptr_type_node, 0);
>>> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
>>>   {
>>>     /* When profile instrumentation, use or test coverage shall be 
>>> performed.
>>>        But for AutoFDO, this there is no instrumentation, thus this pass is
>>> -     diabled.  */
>>> +     disabled.  */
>>>     return (!in_lto_p && !flag_auto_profile
>>>         && (flag_branch_probabilities || flag_test_coverage
>>>             || profile_arc_flag));
>>> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
>>> index 5013956cf86..4869ab8ccd6 100644
>>> --- a/gcc/value-prof.c
>>> +++ b/gcc/value-prof.c
>>> @@ -579,8 +579,8 @@ free_histograms (struct function *fn)
>>>      somehow.  */
>>>     static bool
>>> -check_counter (gimple *stmt, const char * name,
>>> -           gcov_type *count, gcov_type *all, profile_count bb_count_d)
>>> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type 
>>> *all,
>>> +           profile_count bb_count_d, float ratio = 1.0f)
>>>   {
>>>     gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
>>>     if (*all != bb_count || *count > *all)
>>> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
>>>                                "count (%d)\n", name, (int)*all, 
>>> (int)bb_count);
>>>         *all = bb_count;
>>>         if (*count > *all)
>>> -            *count = *all;
>>> +        *count = *all * ratio;
>>>         return false;
>>>       }
>>>         else
>>> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node 
>>> *direct_call,
>>>     return dcall_stmt;
>>>   }
>>>   +/* If --param=indir-call-topn-profile=1 is specified when compiling, 
>>> there maybe
>>> +   multiple indirect targets in histogram.  Check every indirect/virtual 
>>> call
>>> +   if callee function exists, if not exit, leave it to LTO stage for later
>>> +   process.  Modify code of this indirect call to an if-else structure in
>>> +   ipa-profile finally.  */
>>> +static bool
>>> +ic_transform_topn (gimple_stmt_iterator *gsi)
>>> +{
>>> +  unsigned j;
>>> +  gcall *stmt;
>>> +  histogram_value histogram;
>>> +  gcov_type val, count, count_all, all, bb_all;
>>> +  struct cgraph_node *d_call;
>>> +  profile_count bb_count;
>>> +
>>> +  stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
>>> +  if (!stmt)
>>> +    return false;
>>> +
>>> +  if (gimple_call_fndecl (stmt) != NULL_TREE)
>>> +    return false;
>>> +
>>> +  if (gimple_call_internal_p (stmt))
>>> +    return false;
>>> +
>>> +  histogram
>>> +    = gimple_histogram_value_of_type (cfun, stmt, 
>>> HIST_TYPE_INDIR_CALL_TOPN);
>>> +  if (!histogram)
>>> +    return false;
>>> +
>>> +  count = 0;
>>> +  all = 0;
>>> +  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>>> +  bb_count = gimple_bb (stmt)->count;
>>> +
>>> +  /* n_counters need be odd to avoid access violation.  */
>>> +  gcc_assert (histogram->n_counters % 2 == 1);
>>> +
>>> +  /* For indirect call topn, accumulate all the counts first.  */
>>> +  for (j = 1; j < histogram->n_counters; j += 2)
>>> +    {
>>> +      val = histogram->hvalue.counters[j];
>>> +      count = histogram->hvalue.counters[j + 1];
>>> +      if (val)
>>> +    all += count;
>>> +    }
>>> +
>>> +  count_all = all;
>>> +  /* Do the indirect call conversion if function body exists, or else 
>>> leave it
>>> +     to LTO stage.  */
>>> +  for (j = 1; j < histogram->n_counters; j += 2)
>>> +    {
>>> +      val = histogram->hvalue.counters[j];
>>> +      count = histogram->hvalue.counters[j + 1];
>>> +      if (val)
>>> +    {
>>> +      /* The order of CHECK_COUNTER calls is important
>>> +         since check_counter can correct the third parameter
>>> +         and we want to make count <= all <= bb_count.  */
>>> +      if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
>>> +          || check_counter (stmt, "ic", &count, &all,
>>> +                profile_count::from_gcov_type (all),
>>> +                (float) count / count_all))
>>> +        {
>>> +          gimple_remove_histogram_value (cfun, stmt, histogram);
>>> +          return false;
>>> +        }
>>> +
>>> +      d_call = find_func_by_profile_id ((int) val);
>>> +
>>> +      if (d_call == NULL)
>>> +        {
>>> +          if (val)
>>> +        {
>>> +          if (dump_file)
>>> +            {
>>> +              fprintf (
>>> +            dump_file,
>>> +            "Indirect call -> direct call from other module");
>>> +              print_generic_expr (dump_file, gimple_call_fn (stmt),
>>> +                      TDF_SLIM);
>>> +              fprintf (dump_file,
>>> +                   "=> %i (will resolve only with LTO)\n",
>>> +                   (int) val);
>>> +            }
>>> +        }
>>> +          return false;
>>> +        }
>>> +
>>> +      if (!check_ic_target (stmt, d_call))
>>> +        {
>>> +          if (dump_file)
>>> +        {
>>> +          fprintf (dump_file, "Indirect call -> direct call ");
>>> +          print_generic_expr (dump_file, gimple_call_fn (stmt),
>>> +                      TDF_SLIM);
>>> +          fprintf (dump_file, "=> ");
>>> +          print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>>> +          fprintf (dump_file,
>>> +               " transformation skipped because of type mismatch");
>>> +          print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>> +        }
>>> +          gimple_remove_histogram_value (cfun, stmt, histogram);
>>> +          return false;
>>> +        }
>>> +
>>> +      if (dump_file)
>>> +      {
>>> +        fprintf (dump_file, "Indirect call -> direct call ");
>>> +        print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>>> +        fprintf (dump_file, "=> ");
>>> +        print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>>> +        fprintf (dump_file,
>>> +             " transformation on insn postponed to ipa-profile");
>>> +        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>> +        fprintf (dump_file, "hist->count %" PRId64
>>> +        " hist->all %" PRId64"\n", count, all);
>>> +      }
>>> +    }
>>> +    }
>>> +
>>> +  return true;
>>> +}
>>>   /*
>>>     For every checked indirect/virtual call determine if most common pid of
>>> -  function/class method has probability more than 50%. If yes modify code 
>>> of
>>> +  function/class method has probability more than 50%.  If yes modify code 
>>> of
>>>     this call to:
>>>    */
>>>   @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>     histogram_value histogram;
>>>     gcov_type val, count, all, bb_all;
>>>     struct cgraph_node *direct_call;
>>> +  enum hist_type type;
>>>       stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
>>>     if (!stmt)
>>> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>     if (gimple_call_internal_p (stmt))
>>>       return false;
>>>   -  histogram = gimple_histogram_value_of_type (cfun, stmt, 
>>> HIST_TYPE_INDIR_CALL);
>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? 
>>> HIST_TYPE_INDIR_CALL_TOPN
>>> +                             : HIST_TYPE_INDIR_CALL;
>>> +
>>> +  histogram = gimple_histogram_value_of_type (cfun, stmt, type);
>>>     if (!histogram)
>>>       return false;
>>>   +  if (type == HIST_TYPE_INDIR_CALL_TOPN)
>>> +      return ic_transform_topn (gsi);
>>> +
>>>     val = histogram->hvalue.counters [0];
>>>     count = histogram->hvalue.counters [1];
>>>     all = histogram->hvalue.counters [2];
>>>       bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>>> -  /* The order of CHECK_COUNTER calls is important -
>>> +  /* The order of CHECK_COUNTER calls is important
>>>        since check_counter can correct the third parameter
>>> -     and we want to make count <= all <= bb_all. */
>>> +     and we want to make count <= all <= bb_all.  */
>>>     if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
>>>         || check_counter (stmt, "ic", &count, &all,
>>>                   profile_count::from_gcov_type (all)))
>>> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>         print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>>>         fprintf (dump_file, "=> ");
>>>         print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
>>> -      fprintf (dump_file, " transformation on insn postponned to 
>>> ipa-profile");
>>> +      fprintf (dump_file, " transformation on insn postponed to 
>>> ipa-profile");
>>>         print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>>         fprintf (dump_file, "hist->count %" PRId64
>>>              " hist->all %" PRId64"\n", count, all);
>>>
>>
> 

Reply via email to