On Wed, Apr 24, 2013 at 6:59 AM, Jakub Jelinek wrote: > Also, don't some function start in cold section and then switch into hot > section?
Yes, this can happen, and there is nothing in the find_rarely_executed_basic_blocks_and_crossing_edges algorithm to prevent it. It's not supposed to happen, though, and it is only possible to trigger the problem if your profile is based on multiple runs with different test inputs. Currently the decision about to which partition a basic block is assigned is based on the return value of probably_never_executed_bb_p, which looks like this: static vec<edge> find_rarely_executed_basic_blocks_and_crossing_edges (void) { ... FOR_EACH_BB (bb) { if (probably_never_executed_bb_p (cfun, bb)) BB_SET_PARTITION (bb, BB_COLD_PARTITION); else BB_SET_PARTITION (bb, BB_HOT_PARTITION); } .... } bool probably_never_executed_bb_p (struct function *fun, const_basic_block bb) { if (profile_info && flag_branch_probabilities) return ((bb->count + profile_info->runs / 2) / profile_info->runs) == 0; ... return false; } Consider a test case which has, say, profile_info->runs==6, and a function in the test case that is only used in one of the runs so that bb->count==1. In that case, the entry block will be cold, and the supposed-to-be-imposed rule that a hot region is never dominated by a cold region is broken. See attached test case with resulting .dot file. IMHO this is a bug in the bbpart implementation, and the checking code I proposed will expose these bugs. What find_rarely_executed_basic_blocks_and_crossing_edges should do, is identify hot regions and connect them to the entry block and (usually) to the exit block. From what I understand from Teresa's patches, this is what she has implemented. I don't know if we can trigger this situation with the current test infrastructure. Ciao! Steven $ cat t.c #define N 1024*1024*1024 unsigned int a[N]; void __attribute__((__noinline__,__noclone__)) foo (void) { unsigned int i; for (i = 0; i < N; i = i + 2) a[i] = i % 19; } int main (int argc, char *argv[] __attribute__((__unused__))) { if (argc > 1) foo (); return 0; } $ ./xgcc -B. -isystem ./include -O2 -fprofile-generate t.c $ for i in 1 2 3 4 5 ; do ./a.out ; done $ ./a.out 1 $ ./xgcc -B. -isystem ./include -O2 -fprofile-use t.c -fdump-rtl-bbpart{,-graph} -freorder-blocks-and-partition
t.c.199r.bbpart.dot
Description: Binary data