Yes, that would work. So let's discard this patch because the fix for comdat can also fix this problem.
Thanks, Dehao On Tue, Oct 15, 2013 at 8:42 AM, Teresa Johnson <tejohn...@google.com> wrote: > I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and > doing it iteratively. Would that location work? > > Teresa > > On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen <de...@google.com> wrote: >> Thanks for the pointer to Honza's patch. The patch does exactly what I >> need. But it only resides in the instrumentation based FDO path. Can >> we move the code to more common place (like rebuild_cgraph_edges)? >> >> Thanks, >> Dehao >> >> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohn...@google.com> wrote: >>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <de...@google.com> wrote: >>>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>>> For my test case, the entire inline instance is optimized away, so >>>>>> there is no info about it in the profile. I can do some fixup in the >>>>>> rebuild_cgraph_edge though. >>>>> >>>>> Yep, I understand that. In this case we should turn PROFILE_READ to >>>>> PROFILE_GUESSED >>>>> and guess the profile when we detect this (i.e. we have edges with non-0 >>>>> counts into >>>>> functions with 0 profile). That should prvent these from getting >>>>> UNLIKELY_EXECUTED >>>>> and they will be inlined normal way. >>>> >>>> Oh, actually in AutoFDO, only functions with samples will be marked as >>>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED. >>> >>> Here is Honza's patch that he was referring to: >>> >>> Index: tree-profile.c >>> =================================================================== >>> --- tree-profile.c (revision 201838) >>> +++ tree-profile.c (working copy) >>> @@ -604,6 +604,34 @@ >>> >>> pop_cfun (); >>> } >>> + /* See if 0 count function has non-0 count callers. In this case we >>> + lost some profile. Drop its function profile to PROFILE_GUESSED. */ >>> + FOR_EACH_DEFINED_FUNCTION (node) >>> + { >>> + struct cgraph_edge *e; >>> + bool called = false; >>> + if (node->count) >>> + continue; >>> + for (e = node->callers; e; e = e->next_caller) >>> + { >>> + if (e->count) >>> + called = true; >>> + if (cgraph_maybe_hot_edge_p (e)) >>> + break; >>> + } >>> + if (e || called >>> + && profile_status_for_function >>> + (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ) >>> + { >>> + if (dump_file) >>> + fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n", >>> + cgraph_node_name (node), node->symbol.order, >>> + e ? "function is hot" : "function is normal"); >>> + profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl)) >>> + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); >>> + node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; >>> + } >>> + } >>> >>> del_node_map(); >>> return 0; >>> Index: predict.c >>> =================================================================== >>> --- predict.c (revision 201838) >>> +++ predict.c (working copy) >>> @@ -2715,6 +2715,9 @@ >>> gcov_type count_max, true_count_max = 0; >>> basic_block bb; >>> >>> + if (!ENTRY_BLOCK_PTR->count) >>> + return 0; >>> + >>> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) >>> true_count_max = MAX (bb->count, true_count_max); >>> >>> >>> Which is discussed in this email: >>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html >>> >>> For COMDATs I need to extend this to do it a little later to do it >>> recursively to catch the case of COMDATs feeding other COMDATs and I >>> need to do some other handling to compute counts from the frequencies >>> when inlining. I have been meaning to work on this for awhile but >>> finally am getting to it this week. (Here's the last message from a >>> later thread that forked off the above one: >>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html) >>> >>> In the meantime, perhaps Honza's patch will suffice? >>> >>> Teresa >>> >>>> >>>> Dehao >>>> >>>>> >>>>> Honza >>>>>> >>>>>> Dehao >>>>>> >>>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davi...@google.com> >>>>>> wrote: >>>>>> > Is it possible to update the callee node summary after profile >>>>>> > annotate (using information from inline instances which are not >>>>>> > inlined in early inline)? >>>>>> > >>>>>> > David >>>>>> > >>>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <de...@google.com> wrote: >>>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this >>>>>> >>>> could be a potential risk because some callee is marked unlikely >>>>>> >>>> executed simply because they are inlined and eliminated in the O2 >>>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge >>>>>> >>>> is >>>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is >>>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot. >>>>>> >>> >>>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases >>>>>> >>> instead? >>>>>> >>> It seems that having profile set incorrectly will lead to other >>>>>> >>> problems later, too. >>>>>> >>> We discussed similar problem with Teresa about the missing profiles >>>>>> >>> for comdat, >>>>>> >>> basically one should detect these cases as profile being lost and go >>>>>> >>> with guessed >>>>>> >>> profile. (I believe patch for that was posted, too, and so far it >>>>>> >>> seems best approach >>>>>> >>> to this issue) >>>>>> >> >>>>>> >> The current AutoFDO implementation will take all functions that do not >>>>>> >> have have profile as normally executed, thus use guessed profile for >>>>>> >> it. This is like using profile for truly hot functions, and using O2 >>>>>> >> for other functions. This works fine. However, it leads to larger code >>>>>> >> size (approximately 10%~20% larger than FDO). >>>>>> >> >>>>>> >> I'd like to introduce another mode for users who care about both >>>>>> >> performance and code size, and can be sure that profile is >>>>>> >> representative. In this mode, we will mark all functions without >>>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info >>>>>> >> (of optimized code) to represent profile, it's possible that some hot >>>>>> >> functions (say foo) are inlined and fully eliminated into another hot >>>>>> >> function (say bar). So in the profile, bar is cold, and because the >>>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo >>>>>> >> before the profile annotation. However, after profile annotate, we can >>>>>> >> infer from the bb count that foo->bar is hot, thus it should be >>>>>> >> inlined in ipa-inline phase. However, because bar itself is marked >>>>>> >> UNLIKELY_EXECUTED, it will not be inlined. >>>>>> >> >>>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if >>>>>> >> we find an edge's callee is unlikely executed, add the edge count to >>>>>> >> the callee's count and recalculate callee's frequency. >>>>>> >> >>>>>> >> Dehao >>>>>> >> >>>>>> >>> >>>>>> >>> Honza >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413