On Wed, 10 Aug 2016, David Malcolm wrote: > On Wed, 2016-08-10 at 14:04 +0200, Richard Biener wrote: > > The following patch adds the ability to record time spent in utility > > to the pass using it. For things like CFG cleanup, alias stmt > > walking > > or DF infrastructure work it is currently not visible which pass is > > responsible for the time spent there. With -ftime-report-details > > you now get > > > > alias stmt walking : 4.48 ( 2%) usr 0.14 ( 3%) sys 4.66 ( > > 2%) > > wall 1209 kB ( 0%) ggc > > ... > > tree PRE : 3.58 ( 2%) usr 0.04 ( 1%) sys 3.71 ( > > 2%) > > wall 11870 kB ( 1%) ggc > > `- tree tail merge : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( > > 0%) > > wall 0 kB ( 0%) ggc > > `- alias stmt walking : 0.88 ( 0%) usr 0.00 ( 0%) sys 0.91 ( > > 0%) > > wall 153 kB ( 0%) ggc > > `- loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( > > 0%) > > wall 0 kB ( 0%) ggc > > `- tree SSA incremental : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( > > 0%) > > wall 0 kB ( 0%) ggc > > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.04 ( > > 0%) > > wall 544 kB ( 0%) ggc > > `- tree STMT verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > 0%) > > wall 0 kB ( 0%) ggc > > tree FRE : 3.91 ( 2%) usr 0.09 ( 2%) sys 3.71 ( > > 2%) > > wall 5658 kB ( 1%) ggc > > `- alias stmt walking : 2.81 ( 2%) usr 0.08 ( 2%) sys 2.84 ( > > 2%) > > wall 819 kB ( 0%) ggc > > `- unaccounted todo : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( > > 0%) > > wall 851 kB ( 0%) ggc > > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.02 ( > > 0%) > > wall 0 kB ( 0%) ggc > > `- dominance computation: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > 0%) > > wall 0 kB ( 0%) ggc > > tree code sinking : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.17 ( > > 0%) > > wall 1268 kB ( 0%) ggc > > `- loop init : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.11 ( > > 0%) > > wall 526 kB ( 0%) ggc > > `- dominance computation: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > 0%) > > wall 0 kB ( 0%) ggc > > ... > > The example seems to have suffered from line-wrap. Presumably it looks > like: > > alias stmt walking : 4.48 ( 2%) usr 0.14 ( 3%) sys 4.66 ( 2%) > wall 1209 kB ( 0%) ggc > ... > tree PRE : 3.58 ( 2%) usr 0.04 ( 1%) sys 3.71 ( 2%) > wall 11870 kB ( 1%) ggc > `- tree tail merge : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) > wall 0 kB ( 0%) ggc > `- alias stmt walking : 0.88 ( 0%) usr 0.00 ( 0%) sys 0.91 ( 0%) > wall 153 kB ( 0%) ggc > `- loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) > wall 0 kB ( 0%) ggc > `- tree SSA incremental : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) > wall 0 kB ( 0%) ggc > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) > wall 544 kB ( 0%) ggc > `- tree STMT verifier : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) > wall 0 kB ( 0%) ggc > tree FRE : 3.91 ( 2%) usr 0.09 ( 2%) sys 3.71 ( 2%) > wall 5658 kB ( 1%) ggc > `- alias stmt walking : 2.81 ( 2%) usr 0.08 ( 2%) sys 2.84 ( 2%) > wall 819 kB ( 0%) ggc > `- unaccounted todo : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) > wall 851 kB ( 0%) ggc > `- tree operand scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) > wall 0 kB ( 0%) ggc > `- dominance computation: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) > wall 0 kB ( 0%) ggc > tree code sinking : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.17 ( 0%) > wall 1268 kB ( 0%) ggc > `- loop init : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) > wall 526 kB ( 0%) ggc > `- dominance computation: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) > wall 0 kB ( 0%) ggc > ... > > > (...assuming I can get Evolution to suppress line-wrapping above) > > > BTW, did you test this with jit enabled? The jit "frontend" exposes > part of the timer API to client code and uses it in some ways that the > rest of the code doesn't, with its own two-level hierarchy: > https://gcc.gnu.org/onlinedocs/jit/topics/performance.html#the-timing-api
If I have to enable jit manually then no, but I made it a timevar-def map rather than a TV_ID one to catch the named stuff that's probably there for the JIT. We don't dump the info recorded though. > Various other comments inline below > > > note that the full time spent in, say, alias stmt walking is still > > recorded and dumped. Note this also visualizes current nesting > > of timevars and thus you'll get > > > > rest of compilation : 1.02 ( 1%) usr 0.04 ( 1%) sys 1.17 ( > > 1%) > > wall 4977 kB ( 0%) ggc > > `- hard reg cprop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( > > 0%) > > wall 0 kB ( 0%) ggc > > `- tree conservative DCE: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( > > 0%) > > wall 0 kB ( 0%) ggc > > ... > > > > because we have passes w/o timevar that are accumulated in > > rest of compilation. There is the opportunity to clean this up. > > > > I've extended the pass timevars to also cover their TODOs and thus > > removed TV_TODO. > > > > The accounting is "flat", that is, at most one level of sub-timevars > > are recorded. > > > > Does this look sensible and useful given the imperfectness of the > > current timevar assignments? > > > > Bootstrap pending. > > > > Thanks, > > Richard. > > > > 2016-08-10 Richard Biener <rguent...@suse.de> > > > > * passes.c (execute_todo): Do not push/pop TV_TODO. > > (execute_one_ipa_transform_pass): Move timevar push/pop TODO > > execution. > > (execute_one_pass): Likewise. > > * common.opt (ftime-report-details): New switch. > > * doc/invoke.texi (ftime-report-details): Document. > > * timevar.h (timer::print_row): Adjust signature. > > (timer::child_map_t): New typedef. > > (timer::time_var_def): Add children field. > > * timevar.c (timer::named_items::print): Adjust. > > (timer::~timer): Free timevar recorded children. > > (timer::pop_internal): When -ftime-report-details record > > time spent in sub-timevars. > > (timer::print_row): Adjust. > > (timer::print): Print sub-timevar stats. > > * timevar.def (TV_TODO): Remove. > > > > Index: gcc/common.opt > > =================================================================== > > --- gcc/common.opt (revision 238469) > > +++ gcc/common.opt (working copy) > > @@ -2255,6 +2255,10 @@ ftime-report > > Common Report Var(time_report) > > Report the time taken by each compiler pass. > > > > +ftime-report-details > > +Common Report Var(time_report_details) > > +Record times taken by sub-phases separately. > > + > > ftls-model= > > Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default) > > Init(TLS_MODEL_GLOBAL_DYNAMIC) > > -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec] > > Set the default thread-local storage code generation model. > > Index: gcc/doc/invoke.texi > > =================================================================== > > --- gcc/doc/invoke.texi (revision 238469) > > +++ gcc/doc/invoke.texi (working copy) > > @@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}. > > -fprofile-report @gol > > -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol > > -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining > > -verbose @gol > > --fstats -fstack-usage -ftime-report @gol > > +-fstats -fstack-usage -ftime-report -ftime-report-details @gol > > -fvar-tracking-assignments-toggle -gtoggle @gol > > -print-file-name=@var{library} -print-libgcc-file-name @gol > > -print-multi-directory -print-multi-lib -print-multi-os-directory > > @gol > > @@ -12543,6 +12543,10 @@ print some statistics about each pass wh > > Makes the compiler print some statistics about the time consumed by > > each > > pass when it finishes. > > > > +@item -ftime-report > > +@opindex ftime-report > > +Record the time consumed by infrastructure parts separately for each > > pass. > > + > > @item -fira-verbose=@var{n} > > @opindex fira-verbose > > Control the verbosity of the dump file for the integrated register > > allocator. > > Index: gcc/timevar.c > > =================================================================== > > --- gcc/timevar.c (revision 238469) > > +++ gcc/timevar.c (working copy) > > @@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con > > { > > timer::timevar_def *def = m_hash_map.get (item_name); > > gcc_assert (def); > > - m_timer->print_row (fp, total, def); > > + m_timer->print_row (fp, total, def->name, def->elapsed); > > } > > } > > > > @@ -296,6 +296,9 @@ timer::~timer () > > next = iter->next; > > free (iter); > > } > > + for (unsigned i = 0; i < TIMEVAR_LAST; ++i) > > + if (m_timevars[i].children) > > + delete m_timevars[i].children; > > Isn't delete NULL a no-op, so is this check redundant? Not sure, if it is then consider the check removed. > > delete m_jit_client_items; > > } > > @@ -399,13 +402,26 @@ timer::pop_internal () > > /* Attribute the elapsed time to the element we're popping. */ > > timevar_accumulate (&popped->timevar->elapsed, &m_start_time, > > &now); > > > > + /* Take the item off the stack. */ > > + m_stack = m_stack->next; > > + > > + /* Record the elapsed sub-time to the parent as well. */ > > + if (m_stack && time_report_details) > > + { > > + if (! m_stack->timevar->children) > > + m_stack->timevar->children = new child_map_t (5); > > + bool existed_p; > > + timevar_time_def &time > > + = m_stack->timevar->children->get_or_insert (popped->timevar, > > &existed_p); > > + if (! existed_p) > > + memset (&time, 0, sizeof (timevar_time_def)); > > + timevar_accumulate (&time, &m_start_time, &now); > > + } > > Looks like inside a pop there's a lazy allocation of a map, and a > get_or_insert. Could this make -ftime-report have an impact on the > timing? Yes, of course taking the time via gettimeofday also has. timevars do have a (low) overhead. > > /* Reset the start time; from now on, time is attributed to the > > element just exposed on the stack. */ > > m_start_time = now; > > > > - /* Take the item off the stack. */ > > - m_stack = m_stack->next; > > - > > /* Don't delete the stack element; instead, add it to the list of > > unused elements for later use. */ > > popped->next = m_unused_stack_instances; > > @@ -619,38 +635,38 @@ timer::validate_phases (FILE *fp) const > > void > > timer::print_row (FILE *fp, > > const timevar_time_def *total, > > - const timevar_def *tv) > > + const char *name, const timevar_time_def &elapsed) > > { > > /* The timing variable name. */ > > - fprintf (fp, " %-24s:", tv->name); > > + fprintf (fp, " %-24s:", name); > > > > #ifdef HAVE_USER_TIME > > /* Print user-mode time for this process. */ > > fprintf (fp, "%7.2f (%2.0f%%) usr", > > - tv->elapsed.user, > > - (total->user == 0 ? 0 : tv->elapsed.user / total->user) * > > 100); > > + elapsed.user, > > + (total->user == 0 ? 0 : elapsed.user / total->user) * > > 100); > > #endif /* HAVE_USER_TIME */ > > > > #ifdef HAVE_SYS_TIME > > /* Print system-mode time for this process. */ > > fprintf (fp, "%7.2f (%2.0f%%) sys", > > - tv->elapsed.sys, > > - (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) * > > 100); > > + elapsed.sys, > > + (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100); > > #endif /* HAVE_SYS_TIME */ > > > > #ifdef HAVE_WALL_TIME > > /* Print wall clock time elapsed. */ > > fprintf (fp, "%7.2f (%2.0f%%) wall", > > - tv->elapsed.wall, > > - (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) * > > 100); > > + elapsed.wall, > > + (total->wall == 0 ? 0 : elapsed.wall / total->wall) * > > 100); > > #endif /* HAVE_WALL_TIME */ > > > > /* Print the amount of ggc memory allocated. */ > > fprintf (fp, "%8u kB (%2.0f%%) ggc", > > - (unsigned) (tv->elapsed.ggc_mem >> 10), > > + (unsigned) (elapsed.ggc_mem >> 10), > > (total->ggc_mem == 0 > > ? 0 > > - : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100); > > + : (float) elapsed.ggc_mem / total->ggc_mem) * 100); > > > > putc ('\n', fp); > > } > > @@ -702,15 +718,47 @@ timer::print (FILE *fp) > > if (!tv->used) > > continue; > > > > + bool any_childs_with_time = false; > > s/childs/children/ Fixed. > > + if (tv->children) > > + for (child_map_t::iterator i = tv->children->begin (); > > + i != tv->children->end (); ++i) > > + if (!((*i).second.user < tiny > > + && (*i).second.sys < tiny > > + && (*i).second.wall < tiny > > + && (*i).second.ggc_mem < GGC_MEM_BOUND)) > > + { > > + any_childs_with_time = true; > > + break; > > + } > > + > > /* Don't print timing variables if we're going to get a row of > > - zeroes. */ > > - if (tv->elapsed.user < tiny > > + zeroes. Unless there are children with non-zero time. */ > > + if (! any_childs_with_time > > + && tv->elapsed.user < tiny > > && tv->elapsed.sys < tiny > > && tv->elapsed.wall < tiny > > && tv->elapsed.ggc_mem < GGC_MEM_BOUND) > > continue; > > Presumably that repeated conditional for a row of zeroes could be moved > to a helper function (or indeed, a method of the timevar?). Good point. > > - print_row (fp, total, tv); > > + print_row (fp, total, tv->name, tv->elapsed); > > + > > + if (tv->children) > > + for (child_map_t::iterator i = tv->children->begin (); > > + i != tv->children->end (); ++i) > > + { > > + timevar_def *tv2 = (*i).first; > > + /* Don't print timing variables if we're going to get a > > row of > > + zeroes. */ > > + if (!((*i).second.user < tiny > > + && (*i).second.sys < tiny > > + && (*i).second.wall < tiny > > + && (*i).second.ggc_mem < GGC_MEM_BOUND)) > > Same here. > > > + { > > + char lname[256]; > > + sprintf (lname, "`- %s", tv2->name); > > The jit has user-supplied names (via gcc_jit_timer_push) and I don't > think it imposes a length limit, so presumably snprintf would be better > here. > > Alternatively, how about adding a "prefix" param to print_row, or > simply printing the prefix to fp before calling print_row? The latter didn't work for some reason (it at least messes up formatting). But I can try again. Note that gcc_jit_timer_push timevars are not processed in this loop. Also fixed the doc issue noticed by Alex now. Richard. > > + print_row (fp, total, lname, (*i).second); > > + } > > + } > > } > > if (m_jit_client_items) > > m_jit_client_items->print (fp, total); > > Index: gcc/timevar.h > > =================================================================== > > --- gcc/timevar.h (revision 238469) > > +++ gcc/timevar.h (working copy) > > @@ -131,9 +131,10 @@ class timer > > void pop_internal (); > > static void print_row (FILE *fp, > > const timevar_time_def *total, > > - const timevar_def *tv); > > + const char *name, const timevar_time_def > > &elapsed); > > > > private: > > + typedef hash_map<timevar_def *, timevar_time_def> child_map_t; > > > > /* Private type: a timing variable. */ > > struct timevar_def > > @@ -155,6 +156,8 @@ class timer > > /* Nonzero if this timing variable was ever started or pushed > > onto > > the timing stack. */ > > unsigned used : 1; > > + > > + child_map_t *children; > > }; > > > > /* Private type: an element on the timing stack > > Index: gcc/passes.c > > =================================================================== > > *** gcc/passes.c (revision 238469) > > --- gcc/passes.c (working copy) > > *************** execute_todo (unsigned int flags) > > *** 2002,2009 **** > > && need_ssa_update_p (cfun)) > > gcc_assert (flags & TODO_update_ssa_any); > > > > - timevar_push (TV_TODO); > > - > > statistics_fini_pass (); > > > > if (flags) > > --- 2002,2007 ---- > > *************** execute_todo (unsigned int flags) > > *** 2037,2044 **** > > df problems. */ > > if (flags & TODO_df_finish) > > df_finish_pass ((flags & TODO_df_verify) != 0); > > - > > - timevar_pop (TV_TODO); > > } > > > > /* Verify invariants that should hold between passes. This is a > > place > > --- 2035,2040 ---- > > *************** execute_one_ipa_transform_pass (struct c > > *** 2190,2209 **** > > > > pass_init_dump_file (pass); > > > > - /* Run pre-pass verification. */ > > - execute_todo (ipa_pass->function_transform_todo_flags_start); > > - > > /* If a timevar is present, start it. */ > > if (pass->tv_id != TV_NONE) > > timevar_push (pass->tv_id); > > > > /* Do it! */ > > todo_after = ipa_pass->function_transform (node); > > > > - /* Stop timevar. */ > > - if (pass->tv_id != TV_NONE) > > - timevar_pop (pass->tv_id); > > - > > if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) > > check_profile_consistency (pass->static_pass_number, 0, true); > > > > --- 2186,2201 ---- > > > > pass_init_dump_file (pass); > > > > /* If a timevar is present, start it. */ > > if (pass->tv_id != TV_NONE) > > timevar_push (pass->tv_id); > > > > + /* Run pre-pass verification. */ > > + execute_todo (ipa_pass->function_transform_todo_flags_start); > > + > > /* Do it! */ > > todo_after = ipa_pass->function_transform (node); > > > > if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) > > check_profile_consistency (pass->static_pass_number, 0, true); > > > > *************** execute_one_ipa_transform_pass (struct c > > *** 2213,2218 **** > > --- 2205,2214 ---- > > if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) > > check_profile_consistency (pass->static_pass_number, 1, true); > > > > + /* Stop timevar. */ > > + if (pass->tv_id != TV_NONE) > > + timevar_pop (pass->tv_id); > > + > > if (dump_file) > > do_per_function (execute_function_dump, pass); > > pass_fini_dump_file (pass); > > *************** execute_one_pass (opt_pass *pass) > > *** 2321,2326 **** > > --- 2317,2326 ---- > > > > pass_init_dump_file (pass); > > > > + /* If a timevar is present, start it. */ > > + if (pass->tv_id != TV_NONE) > > + timevar_push (pass->tv_id); > > + > > /* Run pre-pass verification. */ > > execute_todo (pass->todo_flags_start); > > > > *************** execute_one_pass (opt_pass *pass) > > *** 2328,2342 **** > > do_per_function (verify_curr_properties, > > (void *)(size_t)pass->properties_required); > > > > - /* If a timevar is present, start it. */ > > - if (pass->tv_id != TV_NONE) > > - timevar_push (pass->tv_id); > > - > > /* Do it! */ > > todo_after = pass->execute (cfun); > > > > if (todo_after & TODO_discard_function) > > { > > pass_fini_dump_file (pass); > > > > gcc_assert (cfun); > > --- 2328,2342 ---- > > do_per_function (verify_curr_properties, > > (void *)(size_t)pass->properties_required); > > > > /* Do it! */ > > todo_after = pass->execute (cfun); > > > > if (todo_after & TODO_discard_function) > > { > > + /* Stop timevar. */ > > + if (pass->tv_id != TV_NONE) > > + timevar_pop (pass->tv_id); > > + > > pass_fini_dump_file (pass); > > > > gcc_assert (cfun); > > *************** execute_one_pass (opt_pass *pass) > > *** 2363,2372 **** > > > > do_per_function (clear_last_verified, NULL); > > > > - /* Stop timevar. */ > > - if (pass->tv_id != TV_NONE) > > - timevar_pop (pass->tv_id); > > - > > do_per_function (update_properties_after_pass, pass); > > > > if (profile_report && cfun && (cfun->curr_properties & PROP_cfg)) > > --- 2363,2368 ---- > > *************** execute_one_pass (opt_pass *pass) > > *** 2378,2383 **** > > --- 2374,2384 ---- > > check_profile_consistency (pass->static_pass_number, 1, true); > > > > verify_interpass_invariants (); > > + > > + /* Stop timevar. */ > > + if (pass->tv_id != TV_NONE) > > + timevar_pop (pass->tv_id); > > + > > if (pass->type == IPA_PASS > > && ((ipa_opt_pass_d *)pass)->function_transform) > > { > > Index: gcc/timevar.def > > =================================================================== > > *** gcc/timevar.def (revision 238469) > > --- gcc/timevar.def (working copy) > > *************** DEFTIMEVAR (TV_POSTRELOAD , "unacco > > *** 290,296 **** > > DEFTIMEVAR (TV_LATE_COMPILATION , "unaccounted late > > compilation") > > DEFTIMEVAR (TV_REMOVE_UNUSED , "remove unused locals") > > DEFTIMEVAR (TV_ADDRESS_TAKEN , "address taken") > > - DEFTIMEVAR (TV_TODO , "unaccounted todo") > > DEFTIMEVAR (TV_VERIFY_LOOP_CLOSED , "verify loop closed") > > DEFTIMEVAR (TV_VERIFY_RTL_SHARING , "verify RTL sharing") > > DEFTIMEVAR (TV_REBUILD_FREQUENCIES , "rebuild frequencies") > > --- 290,295 ---- > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)