On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <[email protected]> wrote:
> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <[email protected]> wrote:
>>> For my test case, the entire inline instance is optimized away, so
>>> there is no info about it in the profile. I can do some fixup in the
>>> rebuild_cgraph_edge though.
>>
>> Yep, I understand that. In this case we should turn PROFILE_READ to
>> PROFILE_GUESSED
>> and guess the profile when we detect this (i.e. we have edges with non-0
>> counts into
>> functions with 0 profile). That should prvent these from getting
>> UNLIKELY_EXECUTED
>> and they will be inlined normal way.
>
> Oh, actually in AutoFDO, only functions with samples will be marked as
> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
Here is Honza's patch that he was referring to:
Index: tree-profile.c
===================================================================
--- tree-profile.c (revision 201838)
+++ tree-profile.c (working copy)
@@ -604,6 +604,34 @@
pop_cfun ();
}
+ /* See if 0 count function has non-0 count callers. In this case we
+ lost some profile. Drop its function profile to PROFILE_GUESSED. */
+ FOR_EACH_DEFINED_FUNCTION (node)
+ {
+ struct cgraph_edge *e;
+ bool called = false;
+ if (node->count)
+ continue;
+ for (e = node->callers; e; e = e->next_caller)
+ {
+ if (e->count)
+ called = true;
+ if (cgraph_maybe_hot_edge_p (e))
+ break;
+ }
+ if (e || called
+ && profile_status_for_function
+ (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
+ {
+ if (dump_file)
+ fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
+ cgraph_node_name (node), node->symbol.order,
+ e ? "function is hot" : "function is normal");
+ profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
+ = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+ node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+ }
+ }
del_node_map();
return 0;
Index: predict.c
===================================================================
--- predict.c (revision 201838)
+++ predict.c (working copy)
@@ -2715,6 +2715,9 @@
gcov_type count_max, true_count_max = 0;
basic_block bb;
+ if (!ENTRY_BLOCK_PTR->count)
+ return 0;
+
FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
true_count_max = MAX (bb->count, true_count_max);
Which is discussed in this email:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
For COMDATs I need to extend this to do it a little later to do it
recursively to catch the case of COMDATs feeding other COMDATs and I
need to do some other handling to compute counts from the frequencies
when inlining. I have been meaning to work on this for awhile but
finally am getting to it this week. (Here's the last message from a
later thread that forked off the above one:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
In the meantime, perhaps Honza's patch will suffice?
Teresa
>
> Dehao
>
>>
>> Honza
>>>
>>> Dehao
>>>
>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <[email protected]>
>>> wrote:
>>> > Is it possible to update the callee node summary after profile
>>> > annotate (using information from inline instances which are not
>>> > inlined in early inline)?
>>> >
>>> > David
>>> >
>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <[email protected]> wrote:
>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <[email protected]> wrote:
>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>> >>>> could be a potential risk because some callee is marked unlikely
>>> >>>> executed simply because they are inlined and eliminated in the O2
>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>> >>>
>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases
>>> >>> instead?
>>> >>> It seems that having profile set incorrectly will lead to other
>>> >>> problems later, too.
>>> >>> We discussed similar problem with Teresa about the missing profiles for
>>> >>> comdat,
>>> >>> basically one should detect these cases as profile being lost and go
>>> >>> with guessed
>>> >>> profile. (I believe patch for that was posted, too, and so far it
>>> >>> seems best approach
>>> >>> to this issue)
>>> >>
>>> >> The current AutoFDO implementation will take all functions that do not
>>> >> have have profile as normally executed, thus use guessed profile for
>>> >> it. This is like using profile for truly hot functions, and using O2
>>> >> for other functions. This works fine. However, it leads to larger code
>>> >> size (approximately 10%~20% larger than FDO).
>>> >>
>>> >> I'd like to introduce another mode for users who care about both
>>> >> performance and code size, and can be sure that profile is
>>> >> representative. In this mode, we will mark all functions without
>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>> >> (of optimized code) to represent profile, it's possible that some hot
>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>> >> function (say bar). So in the profile, bar is cold, and because the
>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>> >> before the profile annotation. However, after profile annotate, we can
>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>> >>
>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>> >> the callee's count and recalculate callee's frequency.
>>> >>
>>> >> Dehao
>>> >>
>>> >>>
>>> >>> Honza
--
Teresa Johnson | Software Engineer | [email protected] | 408-460-2413