Yes, that would work. So let's discard this patch because the fix for
comdat can also fix this problem.

Thanks,
Dehao

On Tue, Oct 15, 2013 at 8:42 AM, Teresa Johnson <tejohn...@google.com> wrote:
> I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and
> doing it iteratively. Would that location work?
>
> Teresa
>
> On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen <de...@google.com> wrote:
>> Thanks for the pointer to Honza's patch. The patch does exactly what I
>> need. But it only resides in the instrumentation based FDO path. Can
>> we move the code to more common place (like rebuild_cgraph_edges)?
>>
>> Thanks,
>> Dehao
>>
>> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohn...@google.com> wrote:
>>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <de...@google.com> wrote:
>>>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>>> For my test case, the entire inline instance is optimized away, so
>>>>>> there is no info about it in the profile. I can do some fixup in the
>>>>>> rebuild_cgraph_edge though.
>>>>>
>>>>> Yep, I understand that.  In this case we should turn PROFILE_READ to 
>>>>> PROFILE_GUESSED
>>>>> and guess the profile when we detect this (i.e. we have edges with non-0 
>>>>> counts into
>>>>> functions with 0 profile).  That should prvent these from getting 
>>>>> UNLIKELY_EXECUTED
>>>>> and they will be inlined normal way.
>>>>
>>>> Oh, actually in AutoFDO, only functions with samples will be marked as
>>>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>>>
>>> Here is Honza's patch that he was referring to:
>>>
>>> Index: tree-profile.c
>>> ===================================================================
>>> --- tree-profile.c (revision 201838)
>>> +++ tree-profile.c (working copy)
>>> @@ -604,6 +604,34 @@
>>>
>>>        pop_cfun ();
>>>      }
>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>> +    {
>>> +      struct cgraph_edge *e;
>>> +      bool called = false;
>>> +      if (node->count)
>>> + continue;
>>> +      for (e = node->callers; e; e = e->next_caller)
>>> + {
>>> +  if (e->count)
>>> +    called = true;
>>> +  if (cgraph_maybe_hot_edge_p (e))
>>> +    break;
>>> + }
>>> +      if (e || called
>>> +  && profile_status_for_function
>>> +      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
>>> + {
>>> +  if (dump_file)
>>> +    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
>>> +     cgraph_node_name (node), node->symbol.order,
>>> +     e ? "function is hot" : "function is normal");
>>> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
>>> +    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>>> + }
>>> +    }
>>>
>>>    del_node_map();
>>>    return 0;
>>> Index: predict.c
>>> ===================================================================
>>> --- predict.c (revision 201838)
>>> +++ predict.c (working copy)
>>> @@ -2715,6 +2715,9 @@
>>>    gcov_type count_max, true_count_max = 0;
>>>    basic_block bb;
>>>
>>> +  if (!ENTRY_BLOCK_PTR->count)
>>> +    return 0;
>>> +
>>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>>      true_count_max = MAX (bb->count, true_count_max);
>>>
>>>
>>> Which is discussed in this email:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>>>
>>> For COMDATs I need to extend this to do it a little later to do it
>>> recursively to catch the case of COMDATs feeding other COMDATs and I
>>> need to do some other handling to compute counts from the frequencies
>>> when inlining. I have been meaning to work on this for awhile but
>>> finally am getting to it this week. (Here's the last message from a
>>> later thread that forked off the above one:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>>>
>>> In the meantime, perhaps Honza's patch will suffice?
>>>
>>> Teresa
>>>
>>>>
>>>> Dehao
>>>>
>>>>>
>>>>> Honza
>>>>>>
>>>>>> Dehao
>>>>>>
>>>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davi...@google.com> 
>>>>>> wrote:
>>>>>> > Is it possible to update the callee node summary after profile
>>>>>> > annotate (using information from inline instances which are not
>>>>>> > inlined in early inline)?
>>>>>> >
>>>>>> > David
>>>>>> >
>>>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <de...@google.com> wrote:
>>>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>>>> >>>> could be a potential risk because some callee is marked unlikely
>>>>>> >>>> executed simply because they are inlined and eliminated in the O2
>>>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge 
>>>>>> >>>> is
>>>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>>>> >>>
>>>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases 
>>>>>> >>> instead?
>>>>>> >>> It seems that having profile set incorrectly will lead to other 
>>>>>> >>> problems later, too.
>>>>>> >>> We discussed similar problem with Teresa about the missing profiles 
>>>>>> >>> for comdat,
>>>>>> >>> basically one should detect these cases as profile being lost and go 
>>>>>> >>> with guessed
>>>>>> >>> profile.  (I believe patch for that was posted, too, and so far it 
>>>>>> >>> seems best approach
>>>>>> >>> to this issue)
>>>>>> >>
>>>>>> >> The current AutoFDO implementation will take all functions that do not
>>>>>> >> have have profile as normally executed, thus use guessed profile for
>>>>>> >> it. This is like using profile for truly hot functions, and using O2
>>>>>> >> for other functions. This works fine. However, it leads to larger code
>>>>>> >> size (approximately 10%~20% larger than FDO).
>>>>>> >>
>>>>>> >> I'd like to introduce another mode for users who care about both
>>>>>> >> performance and code size, and can be sure that profile is
>>>>>> >> representative. In this mode, we will mark all functions without
>>>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>>>>> >> (of optimized code) to represent profile, it's possible that some hot
>>>>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>>>>> >> function (say bar). So in the profile, bar is cold, and because the
>>>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>>>> >> before the profile annotation. However, after profile annotate, we can
>>>>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>>>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>>>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>>>>> >>
>>>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>>>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>>>>> >> the callee's count and recalculate callee's frequency.
>>>>>> >>
>>>>>> >> Dehao
>>>>>> >>
>>>>>> >>>
>>>>>> >>> Honza
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to