We have a program like this: A() { // hot func ... }
B() { A(); // very hot if (i) { A(); // very cold } } Both callsites of A will be inlined into B. In gcc func save_inline_function_body in inline_transform stage, A's first clone will be choosen and turned into a real func. For our case, the clone node choosen corresponds to the cold callsite of A. cgraph_rebuild_references in tree_function_versioning will reset the cgraph node count of the choosen clone to the entry bb count of func A (A is hot). So the cgraph node count of the choosen clone becomes hot while its inline edge count is still cold. It breaks the assumption described here: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html: for inline node, bb->count == edge->count == edge->callee->count For the patch committed in the thread above (it is listed below), cg_edge->callee->count is used for profile update to its inline instance, which leads to a hot BB in func B which is actually very cold. The wrong profile information causes performance regression in one of our internal benchmarks. Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c (revision 210535) +++ gcc/tree-inline.c (working copy) @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c function in any way before this point, as this CALL_EXPR may be a self-referential call; if we're calling ourselves, we need to duplicate our body before altering anything. */ - copy_body (id, bb->count, + copy_body (id, cg_edge->callee->count, GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), bb, return_block, NULL); I don't understand why we need to reset the cgraph node count of the first clone to entry bb count in cgraph_rebuild_references, could you shed some light on it? The above patch may just uncover the problem? Thanks, Wei.