The following patch adds the ability to record time spent in utility to the pass using it. For things like CFG cleanup, alias stmt walking or DF infrastructure work it is currently not visible which pass is responsible for the time spent there. With -ftime-report-details you now get
alias stmt walking : 4.48 ( 2%) usr 0.14 ( 3%) sys 4.66 ( 2%) wall 1209 kB ( 0%) ggc ... tree PRE : 3.58 ( 2%) usr 0.04 ( 1%) sys 3.71 ( 2%) wall 11870 kB ( 1%) ggc `- tree tail merge : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc `- alias stmt walking : 0.88 ( 0%) usr 0.00 ( 0%) sys 0.91 ( 0%) wall 153 kB ( 0%) ggc `- loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc `- tree SSA incremental : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 544 kB ( 0%) ggc `- tree STMT verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree FRE : 3.91 ( 2%) usr 0.09 ( 2%) sys 3.71 ( 2%) wall 5658 kB ( 1%) ggc `- alias stmt walking : 2.81 ( 2%) usr 0.08 ( 2%) sys 2.84 ( 2%) wall 819 kB ( 0%) ggc `- unaccounted todo : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 851 kB ( 0%) ggc `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc `- dominance computation: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree code sinking : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.17 ( 0%) wall 1268 kB ( 0%) ggc `- loop init : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 526 kB ( 0%) ggc `- dominance computation: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ... note that the full time spent in, say, alias stmt walking is still recorded and dumped. Note this also visualizes current nesting of timevars and thus you'll get rest of compilation : 1.02 ( 1%) usr 0.04 ( 1%) sys 1.17 ( 1%) wall 4977 kB ( 0%) ggc `- hard reg cprop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc `- tree conservative DCE: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ... because we have passes w/o timevar that are accumulated in rest of compilation. There is the opportunity to clean this up. I've extended the pass timevars to also cover their TODOs and thus removed TV_TODO. The accounting is "flat", that is, at most one level of sub-timevars are recorded. Does this look sensible and useful given the imperfectness of the current timevar assignments? Bootstrap pending. Thanks, Richard. 2016-08-10 Richard Biener <rguent...@suse.de> * passes.c (execute_todo): Do not push/pop TV_TODO. (execute_one_ipa_transform_pass): Move timevar push/pop TODO execution. (execute_one_pass): Likewise. * common.opt (ftime-report-details): New switch. * doc/invoke.texi (ftime-report-details): Document. * timevar.h (timer::print_row): Adjust signature. (timer::child_map_t): New typedef. (timer::time_var_def): Add children field. * timevar.c (timer::named_items::print): Adjust. (timer::~timer): Free timevar recorded children. (timer::pop_internal): When -ftime-report-details record time spent in sub-timevars. (timer::print_row): Adjust. (timer::print): Print sub-timevar stats. * timevar.def (TV_TODO): Remove. Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 238469) +++ gcc/common.opt (working copy) @@ -2255,6 +2255,10 @@ ftime-report Common Report Var(time_report) Report the time taken by each compiler pass. +ftime-report-details +Common Report Var(time_report_details) +Record times taken by sub-phases separately. + ftls-model= Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default) Init(TLS_MODEL_GLOBAL_DYNAMIC) -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec] Set the default thread-local storage code generation model. Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 238469) +++ gcc/doc/invoke.texi (working copy) @@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}. -fprofile-report @gol -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol --fstats -fstack-usage -ftime-report @gol +-fstats -fstack-usage -ftime-report -ftime-report-details @gol -fvar-tracking-assignments-toggle -gtoggle @gol -print-file-name=@var{library} -print-libgcc-file-name @gol -print-multi-directory -print-multi-lib -print-multi-os-directory @gol @@ -12543,6 +12543,10 @@ print some statistics about each pass wh Makes the compiler print some statistics about the time consumed by each pass when it finishes. +@item -ftime-report +@opindex ftime-report +Record the time consumed by infrastructure parts separately for each pass. + @item -fira-verbose=@var{n} @opindex fira-verbose Control the verbosity of the dump file for the integrated register allocator. Index: gcc/timevar.c =================================================================== --- gcc/timevar.c (revision 238469) +++ gcc/timevar.c (working copy) @@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con { timer::timevar_def *def = m_hash_map.get (item_name); gcc_assert (def); - m_timer->print_row (fp, total, def); + m_timer->print_row (fp, total, def->name, def->elapsed); } } @@ -296,6 +296,9 @@ timer::~timer () next = iter->next; free (iter); } + for (unsigned i = 0; i < TIMEVAR_LAST; ++i) + if (m_timevars[i].children) + delete m_timevars[i].children; delete m_jit_client_items; } @@ -399,13 +402,26 @@ timer::pop_internal () /* Attribute the elapsed time to the element we're popping. */ timevar_accumulate (&popped->timevar->elapsed, &m_start_time, &now); + /* Take the item off the stack. */ + m_stack = m_stack->next; + + /* Record the elapsed sub-time to the parent as well. */ + if (m_stack && time_report_details) + { + if (! m_stack->timevar->children) + m_stack->timevar->children = new child_map_t (5); + bool existed_p; + timevar_time_def &time + = m_stack->timevar->children->get_or_insert (popped->timevar, &existed_p); + if (! existed_p) + memset (&time, 0, sizeof (timevar_time_def)); + timevar_accumulate (&time, &m_start_time, &now); + } + /* Reset the start time; from now on, time is attributed to the element just exposed on the stack. */ m_start_time = now; - /* Take the item off the stack. */ - m_stack = m_stack->next; - /* Don't delete the stack element; instead, add it to the list of unused elements for later use. */ popped->next = m_unused_stack_instances; @@ -619,38 +635,38 @@ timer::validate_phases (FILE *fp) const void timer::print_row (FILE *fp, const timevar_time_def *total, - const timevar_def *tv) + const char *name, const timevar_time_def &elapsed) { /* The timing variable name. */ - fprintf (fp, " %-24s:", tv->name); + fprintf (fp, " %-24s:", name); #ifdef HAVE_USER_TIME /* Print user-mode time for this process. */ fprintf (fp, "%7.2f (%2.0f%%) usr", - tv->elapsed.user, - (total->user == 0 ? 0 : tv->elapsed.user / total->user) * 100); + elapsed.user, + (total->user == 0 ? 0 : elapsed.user / total->user) * 100); #endif /* HAVE_USER_TIME */ #ifdef HAVE_SYS_TIME /* Print system-mode time for this process. */ fprintf (fp, "%7.2f (%2.0f%%) sys", - tv->elapsed.sys, - (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) * 100); + elapsed.sys, + (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100); #endif /* HAVE_SYS_TIME */ #ifdef HAVE_WALL_TIME /* Print wall clock time elapsed. */ fprintf (fp, "%7.2f (%2.0f%%) wall", - tv->elapsed.wall, - (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) * 100); + elapsed.wall, + (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100); #endif /* HAVE_WALL_TIME */ /* Print the amount of ggc memory allocated. */ fprintf (fp, "%8u kB (%2.0f%%) ggc", - (unsigned) (tv->elapsed.ggc_mem >> 10), + (unsigned) (elapsed.ggc_mem >> 10), (total->ggc_mem == 0 ? 0 - : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100); + : (float) elapsed.ggc_mem / total->ggc_mem) * 100); putc ('\n', fp); } @@ -702,15 +718,47 @@ timer::print (FILE *fp) if (!tv->used) continue; + bool any_childs_with_time = false; + if (tv->children) + for (child_map_t::iterator i = tv->children->begin (); + i != tv->children->end (); ++i) + if (!((*i).second.user < tiny + && (*i).second.sys < tiny + && (*i).second.wall < tiny + && (*i).second.ggc_mem < GGC_MEM_BOUND)) + { + any_childs_with_time = true; + break; + } + /* Don't print timing variables if we're going to get a row of - zeroes. */ - if (tv->elapsed.user < tiny + zeroes. Unless there are children with non-zero time. */ + if (! any_childs_with_time + && tv->elapsed.user < tiny && tv->elapsed.sys < tiny && tv->elapsed.wall < tiny && tv->elapsed.ggc_mem < GGC_MEM_BOUND) continue; - print_row (fp, total, tv); + print_row (fp, total, tv->name, tv->elapsed); + + if (tv->children) + for (child_map_t::iterator i = tv->children->begin (); + i != tv->children->end (); ++i) + { + timevar_def *tv2 = (*i).first; + /* Don't print timing variables if we're going to get a row of + zeroes. */ + if (!((*i).second.user < tiny + && (*i).second.sys < tiny + && (*i).second.wall < tiny + && (*i).second.ggc_mem < GGC_MEM_BOUND)) + { + char lname[256]; + sprintf (lname, "`- %s", tv2->name); + print_row (fp, total, lname, (*i).second); + } + } } if (m_jit_client_items) m_jit_client_items->print (fp, total); Index: gcc/timevar.h =================================================================== --- gcc/timevar.h (revision 238469) +++ gcc/timevar.h (working copy) @@ -131,9 +131,10 @@ class timer void pop_internal (); static void print_row (FILE *fp, const timevar_time_def *total, - const timevar_def *tv); + const char *name, const timevar_time_def &elapsed); private: + typedef hash_map<timevar_def *, timevar_time_def> child_map_t; /* Private type: a timing variable. */ struct timevar_def @@ -155,6 +156,8 @@ class timer /* Nonzero if this timing variable was ever started or pushed onto the timing stack. */ unsigned used : 1; + + child_map_t *children; }; /* Private type: an element on the timing stack Index: gcc/passes.c =================================================================== *** gcc/passes.c (revision 238469) --- gcc/passes.c (working copy) *************** execute_todo (unsigned int flags) *** 2002,2009 **** && need_ssa_update_p (cfun)) gcc_assert (flags & TODO_update_ssa_any); - timevar_push (TV_TODO); - statistics_fini_pass (); if (flags) --- 2002,2007 ---- *************** execute_todo (unsigned int flags) *** 2037,2044 **** df problems. */ if (flags & TODO_df_finish) df_finish_pass ((flags & TODO_df_verify) != 0); - - timevar_pop (TV_TODO); } /* Verify invariants that should hold between passes. This is a place --- 2035,2040 ---- *************** execute_one_ipa_transform_pass (struct c *** 2190,2209 **** pass_init_dump_file (pass); - /* Run pre-pass verification. */ - execute_todo (ipa_pass->function_transform_todo_flags_start); - /* If a timevar is present, start it. */ if (pass->tv_id != TV_NONE) timevar_push (pass->tv_id); /* Do it! */ todo_after = ipa_pass->function_transform (node); - /* Stop timevar. */ - if (pass->tv_id != TV_NONE) - timevar_pop (pass->tv_id); - if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) check_profile_consistency (pass->static_pass_number, 0, true); --- 2186,2201 ---- pass_init_dump_file (pass); /* If a timevar is present, start it. */ if (pass->tv_id != TV_NONE) timevar_push (pass->tv_id); + /* Run pre-pass verification. */ + execute_todo (ipa_pass->function_transform_todo_flags_start); + /* Do it! */ todo_after = ipa_pass->function_transform (node); if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) check_profile_consistency (pass->static_pass_number, 0, true); *************** execute_one_ipa_transform_pass (struct c *** 2213,2218 **** --- 2205,2214 ---- if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) check_profile_consistency (pass->static_pass_number, 1, true); + /* Stop timevar. */ + if (pass->tv_id != TV_NONE) + timevar_pop (pass->tv_id); + if (dump_file) do_per_function (execute_function_dump, pass); pass_fini_dump_file (pass); *************** execute_one_pass (opt_pass *pass) *** 2321,2326 **** --- 2317,2326 ---- pass_init_dump_file (pass); + /* If a timevar is present, start it. */ + if (pass->tv_id != TV_NONE) + timevar_push (pass->tv_id); + /* Run pre-pass verification. */ execute_todo (pass->todo_flags_start); *************** execute_one_pass (opt_pass *pass) *** 2328,2342 **** do_per_function (verify_curr_properties, (void *)(size_t)pass->properties_required); - /* If a timevar is present, start it. */ - if (pass->tv_id != TV_NONE) - timevar_push (pass->tv_id); - /* Do it! */ todo_after = pass->execute (cfun); if (todo_after & TODO_discard_function) { pass_fini_dump_file (pass); gcc_assert (cfun); --- 2328,2342 ---- do_per_function (verify_curr_properties, (void *)(size_t)pass->properties_required); /* Do it! */ todo_after = pass->execute (cfun); if (todo_after & TODO_discard_function) { + /* Stop timevar. */ + if (pass->tv_id != TV_NONE) + timevar_pop (pass->tv_id); + pass_fini_dump_file (pass); gcc_assert (cfun); *************** execute_one_pass (opt_pass *pass) *** 2363,2372 **** do_per_function (clear_last_verified, NULL); - /* Stop timevar. */ - if (pass->tv_id != TV_NONE) - timevar_pop (pass->tv_id); - do_per_function (update_properties_after_pass, pass); if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) --- 2363,2368 ---- *************** execute_one_pass (opt_pass *pass) *** 2378,2383 **** --- 2374,2384 ---- check_profile_consistency (pass->static_pass_number, 1, true); verify_interpass_invariants (); + + /* Stop timevar. */ + if (pass->tv_id != TV_NONE) + timevar_pop (pass->tv_id); + if (pass->type == IPA_PASS && ((ipa_opt_pass_d *)pass)->function_transform) { Index: gcc/timevar.def =================================================================== *** gcc/timevar.def (revision 238469) --- gcc/timevar.def (working copy) *************** DEFTIMEVAR (TV_POSTRELOAD , "unacco *** 290,296 **** DEFTIMEVAR (TV_LATE_COMPILATION , "unaccounted late compilation") DEFTIMEVAR (TV_REMOVE_UNUSED , "remove unused locals") DEFTIMEVAR (TV_ADDRESS_TAKEN , "address taken") - DEFTIMEVAR (TV_TODO , "unaccounted todo") DEFTIMEVAR (TV_VERIFY_LOOP_CLOSED , "verify loop closed") DEFTIMEVAR (TV_VERIFY_RTL_SHARING , "verify RTL sharing") DEFTIMEVAR (TV_REBUILD_FREQUENCIES , "rebuild frequencies") --- 290,295 ----