Something seems wrong: in tree_function_version:
initialize_cfun (new_decl, old_decl, old_entry_block->count); >From the above we can see new_decl's entry BB's count will be the same as old_decl (no scaling). In copy_bb, new BB's profile count will also be the same as old BBs? Since new_decl's entry count is wrong, updating the cgraph_node's count in cgraph_rebuild_reference will be wrong: node->count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; // cfun corresponds to new_decl David On Thu, Oct 30, 2014 at 8:49 PM, Wei Mi <w...@google.com> wrote: > We have a program like this: > > A() { // hot func > ... > } > > B() { > A(); // very hot > if (i) { > A(); // very cold > } > } > > Both callsites of A will be inlined into B. In gcc func > save_inline_function_body in inline_transform stage, A's first clone > will be choosen and turned into a real func. For our case, the clone > node choosen corresponds to the cold callsite of A. > cgraph_rebuild_references in tree_function_versioning will reset the > cgraph node count of the choosen clone to the entry bb count of func A > (A is hot). So the cgraph node count of the choosen clone becomes hot > while its inline edge count is still cold. It breaks the assumption > described here: > https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html: > for inline node, bb->count == edge->count == edge->callee->count > > For the patch committed in the thread above (it is listed below), > cg_edge->callee->count is used for profile update to its inline > instance, which leads to a hot BB in func B which is actually very > cold. The wrong profile information causes performance regression in > one of our internal benchmarks. > > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c (revision 210535) > +++ gcc/tree-inline.c (working copy) > @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c > function in any way before this point, as this CALL_EXPR may be > a self-referential call; if we're calling ourselves, we need to > duplicate our body before altering anything. */ > - copy_body (id, bb->count, > + copy_body (id, cg_edge->callee->count, > GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), > bb, return_block, NULL); > > I don't understand why we need to reset the cgraph node count of the > first clone to entry bb count in cgraph_rebuild_references, could you > shed some light on it? The above patch may just uncover the problem? > > Thanks, > Wei.