On Wed, 10 Aug 2016, Richard Biener wrote: > On Wed, 10 Aug 2016, David Malcolm wrote: > > > On Wed, 2016-08-10 at 14:04 +0200, Richard Biener wrote: > > > The following patch adds the ability to record time spent in utility > > > to the pass using it. For things like CFG cleanup, alias stmt > > > walking > > > or DF infrastructure work it is currently not visible which pass is > > > responsible for the time spent there. With -ftime-report-details > > > you now get > > > > > > alias stmt walking : 4.48 ( 2%) usr 0.14 ( 3%) sys 4.66 ( > > > 2%) > > > wall 1209 kB ( 0%) ggc > > > ... > > > tree PRE : 3.58 ( 2%) usr 0.04 ( 1%) sys 3.71 ( > > > 2%) > > > wall 11870 kB ( 1%) ggc > > > `- tree tail merge : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > `- alias stmt walking : 0.88 ( 0%) usr 0.00 ( 0%) sys 0.91 ( > > > 0%) > > > wall 153 kB ( 0%) ggc > > > `- loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > `- tree SSA incremental : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.04 ( > > > 0%) > > > wall 544 kB ( 0%) ggc > > > `- tree STMT verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > tree FRE : 3.91 ( 2%) usr 0.09 ( 2%) sys 3.71 ( > > > 2%) > > > wall 5658 kB ( 1%) ggc > > > `- alias stmt walking : 2.81 ( 2%) usr 0.08 ( 2%) sys 2.84 ( > > > 2%) > > > wall 819 kB ( 0%) ggc > > > `- unaccounted todo : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( > > > 0%) > > > wall 851 kB ( 0%) ggc > > > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.02 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > `- dominance computation: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > tree code sinking : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.17 ( > > > 0%) > > > wall 1268 kB ( 0%) ggc > > > `- loop init : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.11 ( > > > 0%) > > > wall 526 kB ( 0%) ggc > > > `- dominance computation: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > ... > > > > The example seems to have suffered from line-wrap. Presumably it looks > > like: > > > > alias stmt walking : 4.48 ( 2%) usr 0.14 ( 3%) sys 4.66 ( 2%) > > wall 1209 kB ( 0%) ggc > > ... > > tree PRE : 3.58 ( 2%) usr 0.04 ( 1%) sys 3.71 ( 2%) > > wall 11870 kB ( 1%) ggc > > `- tree tail merge : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) > > wall 0 kB ( 0%) ggc > > `- alias stmt walking : 0.88 ( 0%) usr 0.00 ( 0%) sys 0.91 ( 0%) > > wall 153 kB ( 0%) ggc > > `- loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) > > wall 0 kB ( 0%) ggc > > `- tree SSA incremental : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) > > wall 0 kB ( 0%) ggc > > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) > > wall 544 kB ( 0%) ggc > > `- tree STMT verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) > > wall 0 kB ( 0%) ggc > > tree FRE : 3.91 ( 2%) usr 0.09 ( 2%) sys 3.71 ( 2%) > > wall 5658 kB ( 1%) ggc > > `- alias stmt walking : 2.81 ( 2%) usr 0.08 ( 2%) sys 2.84 ( 2%) > > wall 819 kB ( 0%) ggc > > `- unaccounted todo : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) > > wall 851 kB ( 0%) ggc > > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) > > wall 0 kB ( 0%) ggc > > `- dominance computation: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) > > wall 0 kB ( 0%) ggc > > tree code sinking : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.17 ( 0%) > > wall 1268 kB ( 0%) ggc > > `- loop init : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) > > wall 526 kB ( 0%) ggc > > `- dominance computation: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) > > wall 0 kB ( 0%) ggc > > ... > > > > > > (...assuming I can get Evolution to suppress line-wrapping above) > > > > > > BTW, did you test this with jit enabled? The jit "frontend" exposes > > part of the timer API to client code and uses it in some ways that the > > rest of the code doesn't, with its own two-level hierarchy: > > https://gcc.gnu.org/onlinedocs/jit/topics/performance.html#the-timing-api > > If I have to enable jit manually then no, but I made it a timevar-def > map rather than a TV_ID one to catch the named stuff that's probably > there for the JIT. We don't dump the info recorded though. > > > Various other comments inline below > > > > > note that the full time spent in, say, alias stmt walking is still > > > recorded and dumped. Note this also visualizes current nesting > > > of timevars and thus you'll get > > > > > > rest of compilation : 1.02 ( 1%) usr 0.04 ( 1%) sys 1.17 ( > > > 1%) > > > wall 4977 kB ( 0%) ggc > > > `- hard reg cprop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > `- tree conservative DCE: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > > 0%) > > > wall 0 kB ( 0%) ggc > > > ... > > > > > > because we have passes w/o timevar that are accumulated in > > > rest of compilation. There is the opportunity to clean this up. > > > > > > I've extended the pass timevars to also cover their TODOs and thus > > > removed TV_TODO. > > > > > > The accounting is "flat", that is, at most one level of sub-timevars > > > are recorded. > > > > > > Does this look sensible and useful given the imperfectness of the > > > current timevar assignments? > > > > > > Bootstrap pending. > > > > > > Thanks, > > > Richard. > > > > > > 2016-08-10 Richard Biener <rguent...@suse.de> > > > > > > * passes.c (execute_todo): Do not push/pop TV_TODO. > > > (execute_one_ipa_transform_pass): Move timevar push/pop TODO > > > execution. > > > (execute_one_pass): Likewise. > > > * common.opt (ftime-report-details): New switch. > > > * doc/invoke.texi (ftime-report-details): Document. > > > * timevar.h (timer::print_row): Adjust signature. > > > (timer::child_map_t): New typedef. > > > (timer::time_var_def): Add children field. > > > * timevar.c (timer::named_items::print): Adjust. > > > (timer::~timer): Free timevar recorded children. > > > (timer::pop_internal): When -ftime-report-details record > > > time spent in sub-timevars. > > > (timer::print_row): Adjust. > > > (timer::print): Print sub-timevar stats. > > > * timevar.def (TV_TODO): Remove. > > > > > > Index: gcc/common.opt > > > =================================================================== > > > --- gcc/common.opt (revision 238469) > > > +++ gcc/common.opt (working copy) > > > @@ -2255,6 +2255,10 @@ ftime-report > > > Common Report Var(time_report) > > > Report the time taken by each compiler pass. > > > > > > +ftime-report-details > > > +Common Report Var(time_report_details) > > > +Record times taken by sub-phases separately. > > > + > > > ftls-model= > > > Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default) > > > Init(TLS_MODEL_GLOBAL_DYNAMIC) > > > -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec] > > > Set the default thread-local storage code generation model. > > > Index: gcc/doc/invoke.texi > > > =================================================================== > > > --- gcc/doc/invoke.texi (revision 238469) > > > +++ gcc/doc/invoke.texi (working copy) > > > @@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}. > > > -fprofile-report @gol > > > -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol > > > -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining > > > -verbose @gol > > > --fstats -fstack-usage -ftime-report @gol > > > +-fstats -fstack-usage -ftime-report -ftime-report-details @gol > > > -fvar-tracking-assignments-toggle -gtoggle @gol > > > -print-file-name=@var{library} -print-libgcc-file-name @gol > > > -print-multi-directory -print-multi-lib -print-multi-os-directory > > > @gol > > > @@ -12543,6 +12543,10 @@ print some statistics about each pass wh > > > Makes the compiler print some statistics about the time consumed by > > > each > > > pass when it finishes. > > > > > > +@item -ftime-report > > > +@opindex ftime-report > > > +Record the time consumed by infrastructure parts separately for each > > > pass. > > > + > > > @item -fira-verbose=@var{n} > > > @opindex fira-verbose > > > Control the verbosity of the dump file for the integrated register > > > allocator. > > > Index: gcc/timevar.c > > > =================================================================== > > > --- gcc/timevar.c (revision 238469) > > > +++ gcc/timevar.c (working copy) > > > @@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con > > > { > > > timer::timevar_def *def = m_hash_map.get (item_name); > > > gcc_assert (def); > > > - m_timer->print_row (fp, total, def); > > > + m_timer->print_row (fp, total, def->name, def->elapsed); > > > } > > > } > > > > > > @@ -296,6 +296,9 @@ timer::~timer () > > > next = iter->next; > > > free (iter); > > > } > > > + for (unsigned i = 0; i < TIMEVAR_LAST; ++i) > > > + if (m_timevars[i].children) > > > + delete m_timevars[i].children; > > > > Isn't delete NULL a no-op, so is this check redundant? > > Not sure, if it is then consider the check removed. > > > > delete m_jit_client_items; > > > } > > > @@ -399,13 +402,26 @@ timer::pop_internal () > > > /* Attribute the elapsed time to the element we're popping. */ > > > timevar_accumulate (&popped->timevar->elapsed, &m_start_time, > > > &now); > > > > > > + /* Take the item off the stack. */ > > > + m_stack = m_stack->next; > > > + > > > + /* Record the elapsed sub-time to the parent as well. */ > > > + if (m_stack && time_report_details) > > > + { > > > + if (! m_stack->timevar->children) > > > + m_stack->timevar->children = new child_map_t (5); > > > + bool existed_p; > > > + timevar_time_def &time > > > + = m_stack->timevar->children->get_or_insert (popped->timevar, > > > &existed_p); > > > + if (! existed_p) > > > + memset (&time, 0, sizeof (timevar_time_def)); > > > + timevar_accumulate (&time, &m_start_time, &now); > > > + } > > > > Looks like inside a pop there's a lazy allocation of a map, and a > > get_or_insert. Could this make -ftime-report have an impact on the > > timing? > > Yes, of course taking the time via gettimeofday also has. timevars > do have a (low) overhead. > > > > /* Reset the start time; from now on, time is attributed to the > > > element just exposed on the stack. */ > > > m_start_time = now; > > > > > > - /* Take the item off the stack. */ > > > - m_stack = m_stack->next; > > > - > > > /* Don't delete the stack element; instead, add it to the list of > > > unused elements for later use. */ > > > popped->next = m_unused_stack_instances; > > > @@ -619,38 +635,38 @@ timer::validate_phases (FILE *fp) const > > > void > > > timer::print_row (FILE *fp, > > > const timevar_time_def *total, > > > - const timevar_def *tv) > > > + const char *name, const timevar_time_def &elapsed) > > > { > > > /* The timing variable name. */ > > > - fprintf (fp, " %-24s:", tv->name); > > > + fprintf (fp, " %-24s:", name); > > > > > > #ifdef HAVE_USER_TIME > > > /* Print user-mode time for this process. */ > > > fprintf (fp, "%7.2f (%2.0f%%) usr", > > > - tv->elapsed.user, > > > - (total->user == 0 ? 0 : tv->elapsed.user / total->user) * > > > 100); > > > + elapsed.user, > > > + (total->user == 0 ? 0 : elapsed.user / total->user) * > > > 100); > > > #endif /* HAVE_USER_TIME */ > > > > > > #ifdef HAVE_SYS_TIME > > > /* Print system-mode time for this process. */ > > > fprintf (fp, "%7.2f (%2.0f%%) sys", > > > - tv->elapsed.sys, > > > - (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) * > > > 100); > > > + elapsed.sys, > > > + (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100); > > > #endif /* HAVE_SYS_TIME */ > > > > > > #ifdef HAVE_WALL_TIME > > > /* Print wall clock time elapsed. */ > > > fprintf (fp, "%7.2f (%2.0f%%) wall", > > > - tv->elapsed.wall, > > > - (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) * > > > 100); > > > + elapsed.wall, > > > + (total->wall == 0 ? 0 : elapsed.wall / total->wall) * > > > 100); > > > #endif /* HAVE_WALL_TIME */ > > > > > > /* Print the amount of ggc memory allocated. */ > > > fprintf (fp, "%8u kB (%2.0f%%) ggc", > > > - (unsigned) (tv->elapsed.ggc_mem >> 10), > > > + (unsigned) (elapsed.ggc_mem >> 10), > > > (total->ggc_mem == 0 > > > ? 0 > > > - : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100); > > > + : (float) elapsed.ggc_mem / total->ggc_mem) * 100); > > > > > > putc ('\n', fp); > > > } > > > @@ -702,15 +718,47 @@ timer::print (FILE *fp) > > > if (!tv->used) > > > continue; > > > > > > + bool any_childs_with_time = false; > > > > s/childs/children/ > > Fixed. > > > > + if (tv->children) > > > + for (child_map_t::iterator i = tv->children->begin (); > > > + i != tv->children->end (); ++i) > > > + if (!((*i).second.user < tiny > > > + && (*i).second.sys < tiny > > > + && (*i).second.wall < tiny > > > + && (*i).second.ggc_mem < GGC_MEM_BOUND)) > > > + { > > > + any_childs_with_time = true; > > > + break; > > > + } > > > + > > > /* Don't print timing variables if we're going to get a row of > > > - zeroes. */ > > > - if (tv->elapsed.user < tiny > > > + zeroes. Unless there are children with non-zero time. */ > > > + if (! any_childs_with_time > > > + && tv->elapsed.user < tiny > > > && tv->elapsed.sys < tiny > > > && tv->elapsed.wall < tiny > > > && tv->elapsed.ggc_mem < GGC_MEM_BOUND) > > > continue; > > > > Presumably that repeated conditional for a row of zeroes could be moved > > to a helper function (or indeed, a method of the timevar?). > > Good point. > > > > - print_row (fp, total, tv); > > > + print_row (fp, total, tv->name, tv->elapsed); > > > + > > > + if (tv->children) > > > + for (child_map_t::iterator i = tv->children->begin (); > > > + i != tv->children->end (); ++i) > > > + { > > > + timevar_def *tv2 = (*i).first; > > > + /* Don't print timing variables if we're going to get a > > > row of > > > + zeroes. */ > > > + if (!((*i).second.user < tiny > > > + && (*i).second.sys < tiny > > > + && (*i).second.wall < tiny > > > + && (*i).second.ggc_mem < GGC_MEM_BOUND)) > > > > Same here. > > > > > + { > > > + char lname[256]; > > > + sprintf (lname, "`- %s", tv2->name); > > > > The jit has user-supplied names (via gcc_jit_timer_push) and I don't > > think it imposes a length limit, so presumably snprintf would be better > > here. > > > > Alternatively, how about adding a "prefix" param to print_row, or > > simply printing the prefix to fp before calling print_row? > > The latter didn't work for some reason (it at least messes up formatting). > But I can try again.
Ok, so I'd like to have the ':'s aligned so dumping up-front doesn't work. Thus I've ended up using snprintf here. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-08-12 Richard Biener <rguent...@suse.de> * passes.c (execute_todo): Do not push/pop TV_TODO. (execute_one_ipa_transform_pass): Move timevar push/pop TODO execution. (execute_one_pass): Likewise. * common.opt (ftime-report-details): New switch. * doc/invoke.texi (ftime-report-details): Document. * timevar.h (timer::print_row): Adjust signature. (timer::all_zero): New static helper. (timer::child_map_t): New typedef. (timer::time_var_def): Add children field. * timevar.c (timer::named_items::print): Adjust. (timer::~timer): Free timevar recorded children. (timer::pop_internal): When -ftime-report-details record time spent in sub-timevars. (timer::print_row): Adjust. (timer::print): Print sub-timevar stats, use all_zero. * timevar.def (TV_TODO): Remove. Index: gcc/passes.c =================================================================== --- gcc/passes.c (revision 238469) +++ gcc/passes.c (working copy) @@ -2002,8 +2002,6 @@ execute_todo (unsigned int flags) && need_ssa_update_p (cfun)) gcc_assert (flags & TODO_update_ssa_any); - timevar_push (TV_TODO); - statistics_fini_pass (); if (flags) @@ -2037,8 +2035,6 @@ execute_todo (unsigned int flags) df problems. */ if (flags & TODO_df_finish) df_finish_pass ((flags & TODO_df_verify) != 0); - - timevar_pop (TV_TODO); } /* Verify invariants that should hold between passes. This is a place @@ -2190,20 +2186,16 @@ execute_one_ipa_transform_pass (struct c pass_init_dump_file (pass); - /* Run pre-pass verification. */ - execute_todo (ipa_pass->function_transform_todo_flags_start); - /* If a timevar is present, start it. */ if (pass->tv_id != TV_NONE) timevar_push (pass->tv_id); + /* Run pre-pass verification. */ + execute_todo (ipa_pass->function_transform_todo_flags_start); + /* Do it! */ todo_after = ipa_pass->function_transform (node); - /* Stop timevar. */ - if (pass->tv_id != TV_NONE) - timevar_pop (pass->tv_id); - if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) check_profile_consistency (pass->static_pass_number, 0, true); @@ -2213,6 +2205,10 @@ execute_one_ipa_transform_pass (struct c if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) check_profile_consistency (pass->static_pass_number, 1, true); + /* Stop timevar. */ + if (pass->tv_id != TV_NONE) + timevar_pop (pass->tv_id); + if (dump_file) do_per_function (execute_function_dump, pass); pass_fini_dump_file (pass); @@ -2321,6 +2317,10 @@ execute_one_pass (opt_pass *pass) pass_init_dump_file (pass); + /* If a timevar is present, start it. */ + if (pass->tv_id != TV_NONE) + timevar_push (pass->tv_id); + /* Run pre-pass verification. */ execute_todo (pass->todo_flags_start); @@ -2328,15 +2328,15 @@ execute_one_pass (opt_pass *pass) do_per_function (verify_curr_properties, (void *)(size_t)pass->properties_required); - /* If a timevar is present, start it. */ - if (pass->tv_id != TV_NONE) - timevar_push (pass->tv_id); - /* Do it! */ todo_after = pass->execute (cfun); if (todo_after & TODO_discard_function) { + /* Stop timevar. */ + if (pass->tv_id != TV_NONE) + timevar_pop (pass->tv_id); + pass_fini_dump_file (pass); gcc_assert (cfun); @@ -2363,10 +2363,6 @@ execute_one_pass (opt_pass *pass) do_per_function (clear_last_verified, NULL); - /* Stop timevar. */ - if (pass->tv_id != TV_NONE) - timevar_pop (pass->tv_id); - do_per_function (update_properties_after_pass, pass); if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) @@ -2378,6 +2374,11 @@ execute_one_pass (opt_pass *pass) check_profile_consistency (pass->static_pass_number, 1, true); verify_interpass_invariants (); + + /* Stop timevar. */ + if (pass->tv_id != TV_NONE) + timevar_pop (pass->tv_id); + if (pass->type == IPA_PASS && ((ipa_opt_pass_d *)pass)->function_transform) { Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 238469) +++ gcc/common.opt (working copy) @@ -2255,6 +2255,10 @@ ftime-report Common Report Var(time_report) Report the time taken by each compiler pass. +ftime-report-details +Common Report Var(time_report_details) +Record times taken by sub-phases separately. + ftls-model= Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default) Init(TLS_MODEL_GLOBAL_DYNAMIC) -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec] Set the default thread-local storage code generation model. Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 238469) +++ gcc/doc/invoke.texi (working copy) @@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}. -fprofile-report @gol -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol --fstats -fstack-usage -ftime-report @gol +-fstats -fstack-usage -ftime-report -ftime-report-details @gol -fvar-tracking-assignments-toggle -gtoggle @gol -print-file-name=@var{library} -print-libgcc-file-name @gol -print-multi-directory -print-multi-lib -print-multi-os-directory @gol @@ -12543,6 +12543,10 @@ print some statistics about each pass wh Makes the compiler print some statistics about the time consumed by each pass when it finishes. +@item -ftime-report-details +@opindex ftime-report-details +Record the time consumed by infrastructure parts separately for each pass. + @item -fira-verbose=@var{n} @opindex fira-verbose Control the verbosity of the dump file for the integrated register allocator. Index: gcc/timevar.c =================================================================== --- gcc/timevar.c (revision 238469) +++ gcc/timevar.c (working copy) @@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con { timer::timevar_def *def = m_hash_map.get (item_name); gcc_assert (def); - m_timer->print_row (fp, total, def); + m_timer->print_row (fp, total, def->name, def->elapsed); } } @@ -296,6 +296,8 @@ timer::~timer () next = iter->next; free (iter); } + for (unsigned i = 0; i < TIMEVAR_LAST; ++i) + delete m_timevars[i].children; delete m_jit_client_items; } @@ -399,13 +401,26 @@ timer::pop_internal () /* Attribute the elapsed time to the element we're popping. */ timevar_accumulate (&popped->timevar->elapsed, &m_start_time, &now); + /* Take the item off the stack. */ + m_stack = m_stack->next; + + /* Record the elapsed sub-time to the parent as well. */ + if (m_stack && time_report_details) + { + if (! m_stack->timevar->children) + m_stack->timevar->children = new child_map_t (5); + bool existed_p; + timevar_time_def &time + = m_stack->timevar->children->get_or_insert (popped->timevar, &existed_p); + if (! existed_p) + memset (&time, 0, sizeof (timevar_time_def)); + timevar_accumulate (&time, &m_start_time, &now); + } + /* Reset the start time; from now on, time is attributed to the element just exposed on the stack. */ m_start_time = now; - /* Take the item off the stack. */ - m_stack = m_stack->next; - /* Don't delete the stack element; instead, add it to the list of unused elements for later use. */ popped->next = m_unused_stack_instances; @@ -619,42 +634,54 @@ timer::validate_phases (FILE *fp) const void timer::print_row (FILE *fp, const timevar_time_def *total, - const timevar_def *tv) + const char *name, const timevar_time_def &elapsed) { /* The timing variable name. */ - fprintf (fp, " %-24s:", tv->name); + fprintf (fp, " %-24s:", name); #ifdef HAVE_USER_TIME /* Print user-mode time for this process. */ fprintf (fp, "%7.2f (%2.0f%%) usr", - tv->elapsed.user, - (total->user == 0 ? 0 : tv->elapsed.user / total->user) * 100); + elapsed.user, + (total->user == 0 ? 0 : elapsed.user / total->user) * 100); #endif /* HAVE_USER_TIME */ #ifdef HAVE_SYS_TIME /* Print system-mode time for this process. */ fprintf (fp, "%7.2f (%2.0f%%) sys", - tv->elapsed.sys, - (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) * 100); + elapsed.sys, + (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100); #endif /* HAVE_SYS_TIME */ #ifdef HAVE_WALL_TIME /* Print wall clock time elapsed. */ fprintf (fp, "%7.2f (%2.0f%%) wall", - tv->elapsed.wall, - (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) * 100); + elapsed.wall, + (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100); #endif /* HAVE_WALL_TIME */ /* Print the amount of ggc memory allocated. */ fprintf (fp, "%8u kB (%2.0f%%) ggc", - (unsigned) (tv->elapsed.ggc_mem >> 10), + (unsigned) (elapsed.ggc_mem >> 10), (total->ggc_mem == 0 ? 0 - : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100); + : (float) elapsed.ggc_mem / total->ggc_mem) * 100); putc ('\n', fp); } +/* Return whether ELAPSED is all zero. */ + +bool +timer::all_zero (const timevar_time_def &elapsed) +{ + const double tiny = 5e-3; + return (elapsed.user < tiny + && elapsed.sys < tiny + && elapsed.wall < tiny + && elapsed.ggc_mem < GGC_MEM_BOUND); +} + /* Summarize timing variables to FP. The timing variable TV_TOTAL has a special meaning -- it's considered to be the total elapsed time, for normalizing the others, and is displayed last. */ @@ -691,7 +718,6 @@ timer::print (FILE *fp) for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id) { const timevar_def *tv = &m_timevars[(timevar_id_t) id]; - const double tiny = 5e-3; /* Don't print the total execution time here; that goes at the end. */ @@ -702,15 +728,38 @@ timer::print (FILE *fp) if (!tv->used) continue; + bool any_children_with_time = false; + if (tv->children) + for (child_map_t::iterator i = tv->children->begin (); + i != tv->children->end (); ++i) + if (! all_zero ((*i).second)) + { + any_children_with_time = true; + break; + } + /* Don't print timing variables if we're going to get a row of - zeroes. */ - if (tv->elapsed.user < tiny - && tv->elapsed.sys < tiny - && tv->elapsed.wall < tiny - && tv->elapsed.ggc_mem < GGC_MEM_BOUND) + zeroes. Unless there are children with non-zero time. */ + if (! any_children_with_time + && all_zero (tv->elapsed)) continue; - print_row (fp, total, tv); + print_row (fp, total, tv->name, tv->elapsed); + + if (tv->children) + for (child_map_t::iterator i = tv->children->begin (); + i != tv->children->end (); ++i) + { + timevar_def *tv2 = (*i).first; + /* Don't print timing variables if we're going to get a row of + zeroes. */ + if (! all_zero ((*i).second)) + { + char lname[256]; + snprintf (lname, 256, "`- %s", tv2->name); + print_row (fp, total, lname, (*i).second); + } + } } if (m_jit_client_items) m_jit_client_items->print (fp, total); Index: gcc/timevar.h =================================================================== --- gcc/timevar.h (revision 238469) +++ gcc/timevar.h (working copy) @@ -131,9 +131,11 @@ class timer void pop_internal (); static void print_row (FILE *fp, const timevar_time_def *total, - const timevar_def *tv); + const char *name, const timevar_time_def &elapsed); + static bool all_zero (const timevar_time_def &elapsed); private: + typedef hash_map<timevar_def *, timevar_time_def> child_map_t; /* Private type: a timing variable. */ struct timevar_def @@ -155,6 +157,8 @@ class timer /* Nonzero if this timing variable was ever started or pushed onto the timing stack. */ unsigned used : 1; + + child_map_t *children; }; /* Private type: an element on the timing stack