On Wed, 10 Aug 2016, Richard Biener wrote:

> On Wed, 10 Aug 2016, David Malcolm wrote:
> 
> > On Wed, 2016-08-10 at 14:04 +0200, Richard Biener wrote:
> > > The following patch adds the ability to record time spent in utility
> > > to the pass using it.  For things like CFG cleanup, alias stmt
> > > walking
> > > or DF infrastructure work it is currently not visible which pass is
> > > responsible for the time spent there.  With -ftime-report-details
> > > you now get
> > > 
> > >  alias stmt walking      :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 (
> > > 2%) 
> > > wall    1209 kB ( 0%) ggc
> > > ...
> > >  tree PRE                :   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 (
> > > 2%) 
> > > wall   11870 kB ( 1%) ggc
> > >  `- tree tail merge      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > >  `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 (
> > > 0%) 
> > > wall     153 kB ( 0%) ggc
> > >  `- loop init            :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > >  `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > >  `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 (
> > > 0%) 
> > > wall     544 kB ( 0%) ggc
> > >  `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > >  tree FRE                :   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 (
> > > 2%) 
> > > wall    5658 kB ( 1%) ggc
> > >  `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 (
> > > 2%) 
> > > wall     819 kB ( 0%) ggc
> > >  `- unaccounted todo     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (
> > > 0%) 
> > > wall     851 kB ( 0%) ggc
> > >  `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > >  `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > >  tree code sinking       :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 (
> > > 0%) 
> > > wall    1268 kB ( 0%) ggc
> > >  `- loop init            :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 (
> > > 0%) 
> > > wall     526 kB ( 0%) ggc
> > >  `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > > ...
> > 
> > The example seems to have suffered from line-wrap.  Presumably it looks
> > like:
> > 
> >  alias stmt walking      :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 ( 2%) 
> > wall    1209 kB ( 0%) ggc
> > ...
> >  tree PRE                :   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 ( 2%) 
> > wall   11870 kB ( 1%) ggc
> >  `- tree tail merge      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
> > wall       0 kB ( 0%) ggc
> >  `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 ( 0%) 
> > wall     153 kB ( 0%) ggc
> >  `- loop init            :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) 
> > wall       0 kB ( 0%) ggc
> >  `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
> > wall       0 kB ( 0%) ggc
> >  `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
> > wall     544 kB ( 0%) ggc
> >  `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> > wall       0 kB ( 0%) ggc
> >  tree FRE                :   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 ( 2%) 
> > wall    5658 kB ( 1%) ggc
> >  `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 ( 2%) 
> > wall     819 kB ( 0%) ggc
> >  `- unaccounted todo     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
> > wall     851 kB ( 0%) ggc
> >  `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
> > wall       0 kB ( 0%) ggc
> >  `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> > wall       0 kB ( 0%) ggc
> >  tree code sinking       :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 ( 0%) 
> > wall    1268 kB ( 0%) ggc
> >  `- loop init            :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) 
> > wall     526 kB ( 0%) ggc
> >  `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> > wall       0 kB ( 0%) ggc
> > ...
> > 
> > 
> > (...assuming I can get Evolution to suppress line-wrapping above)
> > 
> > 
> > BTW, did you test this with jit enabled?  The jit "frontend" exposes
> > part of the timer API to client code and uses it in some ways that the
> > rest of the code doesn't, with its own two-level hierarchy:
> > https://gcc.gnu.org/onlinedocs/jit/topics/performance.html#the-timing-api
> 
> If I have to enable jit manually then no, but I made it a timevar-def
> map rather than a TV_ID one to catch the named stuff that's probably
> there for the JIT.  We don't dump the info recorded though.
> 
> > Various other comments inline below
> > 
> > > note that the full time spent in, say, alias stmt walking is still
> > > recorded and dumped.  Note this also visualizes current nesting
> > > of timevars and thus you'll get
> > > 
> > >  rest of compilation     :   1.02 ( 1%) usr   0.04 ( 1%) sys   1.17 (
> > > 1%) 
> > > wall    4977 kB ( 0%) ggc
> > >  `- hard reg cprop       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > >  `- tree conservative DCE:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> > > 0%) 
> > > wall       0 kB ( 0%) ggc
> > > ...
> > > 
> > > because we have passes w/o timevar that are accumulated in
> > > rest of compilation.  There is the opportunity to clean this up.
> > > 
> > > I've extended the pass timevars to also cover their TODOs and thus
> > > removed TV_TODO.
> > > 
> > > The accounting is "flat", that is, at most one level of sub-timevars
> > > are recorded.
> > > 
> > > Does this look sensible and useful given the imperfectness of the
> > > current timevar assignments?
> > > 
> > > Bootstrap pending.
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 2016-08-10  Richard Biener  <rguent...@suse.de>
> > > 
> > >         * passes.c (execute_todo): Do not push/pop TV_TODO.
> > >         (execute_one_ipa_transform_pass): Move timevar push/pop TODO
> > > execution.
> > >         (execute_one_pass): Likewise.
> > >         * common.opt (ftime-report-details): New switch.
> > >         * doc/invoke.texi (ftime-report-details): Document.
> > >         * timevar.h (timer::print_row): Adjust signature.
> > >         (timer::child_map_t): New typedef.
> > >         (timer::time_var_def): Add children field.
> > >         * timevar.c (timer::named_items::print): Adjust.
> > >         (timer::~timer): Free timevar recorded children.
> > >         (timer::pop_internal): When -ftime-report-details record
> > >         time spent in sub-timevars.
> > >         (timer::print_row): Adjust.
> > >         (timer::print): Print sub-timevar stats.
> > >         * timevar.def (TV_TODO): Remove.
> > > 
> > > Index: gcc/common.opt
> > > ===================================================================
> > > --- gcc/common.opt      (revision 238469)
> > > +++ gcc/common.opt      (working copy)
> > > @@ -2255,6 +2255,10 @@ ftime-report
> > >  Common Report Var(time_report)
> > >  Report the time taken by each compiler pass.
> > >  
> > > +ftime-report-details
> > > +Common Report Var(time_report_details)
> > > +Record times taken by sub-phases separately.
> > > +
> > >  ftls-model=
> > >  Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default)
> > > Init(TLS_MODEL_GLOBAL_DYNAMIC)
> > >  -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec]  
> > >    Set the default thread-local storage code generation model.
> > > Index: gcc/doc/invoke.texi
> > > ===================================================================
> > > --- gcc/doc/invoke.texi (revision 238469)
> > > +++ gcc/doc/invoke.texi (working copy)
> > > @@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}.
> > >  -fprofile-report @gol
> > >  -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
> > >  -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining
> > > -verbose @gol
> > > --fstats  -fstack-usage  -ftime-report @gol
> > > +-fstats  -fstack-usage  -ftime-report -ftime-report-details @gol
> > >  -fvar-tracking-assignments-toggle -gtoggle @gol
> > >  -print-file-name=@var{library}  -print-libgcc-file-name @gol
> > >  -print-multi-directory  -print-multi-lib  -print-multi-os-directory
> > > @gol
> > > @@ -12543,6 +12543,10 @@ print some statistics about each pass wh
> > >  Makes the compiler print some statistics about the time consumed by
> > > each
> > >  pass when it finishes.
> > >  
> > > +@item -ftime-report
> > > +@opindex ftime-report
> > > +Record the time consumed by infrastructure parts separately for each
> > > pass.
> > > +
> > >  @item -fira-verbose=@var{n}
> > >  @opindex fira-verbose
> > >  Control the verbosity of the dump file for the integrated register
> > > allocator.
> > > Index: gcc/timevar.c
> > > ===================================================================
> > > --- gcc/timevar.c       (revision 238469)
> > > +++ gcc/timevar.c       (working copy)
> > > @@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con
> > >      {
> > >        timer::timevar_def *def = m_hash_map.get (item_name);
> > >        gcc_assert (def);
> > > -      m_timer->print_row (fp, total, def);
> > > +      m_timer->print_row (fp, total, def->name, def->elapsed);
> > >      }
> > >  }
> > >  
> > > @@ -296,6 +296,9 @@ timer::~timer ()
> > >        next = iter->next;
> > >        free (iter);
> > >      }
> > > +  for (unsigned i = 0; i < TIMEVAR_LAST; ++i)
> > > +    if (m_timevars[i].children)
> > > +      delete m_timevars[i].children;
> > 
> > Isn't delete NULL a no-op, so is this check redundant?
> 
> Not sure, if it is then consider the check removed.
> 
> > >    delete m_jit_client_items;
> > >  }
> > > @@ -399,13 +402,26 @@ timer::pop_internal ()
> > >    /* Attribute the elapsed time to the element we're popping.  */
> > >    timevar_accumulate (&popped->timevar->elapsed, &m_start_time,
> > > &now);
> > >  
> > > +  /* Take the item off the stack.  */
> > > +  m_stack = m_stack->next;
> > > +
> > > +  /* Record the elapsed sub-time to the parent as well.  */
> > > +  if (m_stack && time_report_details)
> > > +    {
> > > +      if (! m_stack->timevar->children)
> > > +       m_stack->timevar->children = new child_map_t (5);
> > > +      bool existed_p;
> > > +      timevar_time_def &time
> > > +       = m_stack->timevar->children->get_or_insert (popped->timevar,
> > > &existed_p);
> > > +      if (! existed_p)
> > > +       memset (&time, 0, sizeof (timevar_time_def));
> > > +      timevar_accumulate (&time, &m_start_time, &now);
> > > +    }
> > 
> > Looks like inside a pop there's a lazy allocation of a map, and a
> > get_or_insert.  Could this make -ftime-report have an impact on the
> > timing?
> 
> Yes, of course taking the time via gettimeofday also has.  timevars
> do have a (low) overhead.
> 
> > >    /* Reset the start time; from now on, time is attributed to the
> > >       element just exposed on the stack.  */
> > >    m_start_time = now;
> > >  
> > > -  /* Take the item off the stack.  */
> > > -  m_stack = m_stack->next;
> > > -
> > >    /* Don't delete the stack element; instead, add it to the list of
> > >       unused elements for later use.  */
> > >    popped->next = m_unused_stack_instances;
> > > @@ -619,38 +635,38 @@ timer::validate_phases (FILE *fp) const
> > >  void
> > >  timer::print_row (FILE *fp,
> > >                   const timevar_time_def *total,
> > > -                 const timevar_def *tv)
> > > +                 const char *name, const timevar_time_def &elapsed)
> > >  {
> > >    /* The timing variable name.  */
> > > -  fprintf (fp, " %-24s:", tv->name);
> > > +  fprintf (fp, " %-24s:", name);
> > >  
> > >  #ifdef HAVE_USER_TIME
> > >    /* Print user-mode time for this process.  */
> > >    fprintf (fp, "%7.2f (%2.0f%%) usr",
> > > -          tv->elapsed.user,
> > > -          (total->user == 0 ? 0 : tv->elapsed.user / total->user) *
> > > 100);
> > > +          elapsed.user,
> > > +          (total->user == 0 ? 0 : elapsed.user / total->user) *
> > > 100);
> > >  #endif /* HAVE_USER_TIME */
> > >  
> > >  #ifdef HAVE_SYS_TIME
> > >    /* Print system-mode time for this process.  */
> > >    fprintf (fp, "%7.2f (%2.0f%%) sys",
> > > -          tv->elapsed.sys,
> > > -          (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) *
> > > 100);
> > > +          elapsed.sys,
> > > +          (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
> > >  #endif /* HAVE_SYS_TIME */
> > >  
> > >  #ifdef HAVE_WALL_TIME
> > >    /* Print wall clock time elapsed.  */
> > >    fprintf (fp, "%7.2f (%2.0f%%) wall",
> > > -          tv->elapsed.wall,
> > > -          (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) *
> > > 100);
> > > +          elapsed.wall,
> > > +          (total->wall == 0 ? 0 : elapsed.wall / total->wall) *
> > > 100);
> > >  #endif /* HAVE_WALL_TIME */
> > >  
> > >    /* Print the amount of ggc memory allocated.  */
> > >    fprintf (fp, "%8u kB (%2.0f%%) ggc",
> > > -          (unsigned) (tv->elapsed.ggc_mem >> 10),
> > > +          (unsigned) (elapsed.ggc_mem >> 10),
> > >            (total->ggc_mem == 0
> > >             ? 0
> > > -           : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100);
> > > +           : (float) elapsed.ggc_mem / total->ggc_mem) * 100);
> > >  
> > >    putc ('\n', fp);
> > >  }
> > > @@ -702,15 +718,47 @@ timer::print (FILE *fp)
> > >        if (!tv->used)
> > >         continue;
> > >  
> > > +      bool any_childs_with_time = false;
> > 
> > s/childs/children/
> 
> Fixed.
> 
> > > +      if (tv->children)
> > > +       for (child_map_t::iterator i = tv->children->begin ();
> > > +            i != tv->children->end (); ++i)
> > > +         if (!((*i).second.user < tiny
> > > +               && (*i).second.sys < tiny
> > > +               && (*i).second.wall < tiny
> > > +               && (*i).second.ggc_mem < GGC_MEM_BOUND))
> > > +           {
> > > +             any_childs_with_time = true;
> > > +             break;
> > > +           }
> > > +
> > >        /* Don't print timing variables if we're going to get a row of
> > > -         zeroes.  */
> > > -      if (tv->elapsed.user < tiny
> > > +         zeroes.  Unless there are children with non-zero time.  */
> > > +      if (! any_childs_with_time
> > > +         && tv->elapsed.user < tiny
> > >           && tv->elapsed.sys < tiny
> > >           && tv->elapsed.wall < tiny
> > >           && tv->elapsed.ggc_mem < GGC_MEM_BOUND)
> > >         continue;
> > 
> > Presumably that repeated conditional for a row of zeroes could be moved
> > to a helper function (or indeed, a method of the timevar?).
> 
> Good point.
> 
> > > -      print_row (fp, total, tv);
> > > +      print_row (fp, total, tv->name, tv->elapsed);
> > > +
> > > +      if (tv->children)
> > > +       for (child_map_t::iterator i = tv->children->begin ();
> > > +            i != tv->children->end (); ++i)
> > > +         {
> > > +           timevar_def *tv2 = (*i).first;
> > > +           /* Don't print timing variables if we're going to get a
> > > row of
> > > +              zeroes.  */
> > > +           if (!((*i).second.user < tiny
> > > +                 && (*i).second.sys < tiny
> > > +                 && (*i).second.wall < tiny
> > > +                 && (*i).second.ggc_mem < GGC_MEM_BOUND))
> > 
> > Same here.
> > 
> > > +             {
> > > +               char lname[256];
> > > +               sprintf (lname, "`- %s", tv2->name);
> > 
> > The jit has user-supplied names (via gcc_jit_timer_push) and I don't
> > think it imposes a length limit, so presumably snprintf would be better
> > here.
> >
> > Alternatively, how about adding a "prefix" param to print_row, or
> > simply printing the prefix to fp before calling print_row?
> 
> The latter didn't work for some reason (it at least messes up formatting).  
> But I can try again.

Ok, so I'd like to have the ':'s aligned so dumping up-front doesn't
work.  Thus I've ended up using snprintf here.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-08-12  Richard Biener  <rguent...@suse.de>

        * passes.c (execute_todo): Do not push/pop TV_TODO.
        (execute_one_ipa_transform_pass): Move timevar push/pop TODO execution.
        (execute_one_pass): Likewise.
        * common.opt (ftime-report-details): New switch.
        * doc/invoke.texi (ftime-report-details): Document.
        * timevar.h (timer::print_row): Adjust signature.
        (timer::all_zero): New static helper.
        (timer::child_map_t): New typedef.
        (timer::time_var_def): Add children field.
        * timevar.c (timer::named_items::print): Adjust.
        (timer::~timer): Free timevar recorded children.
        (timer::pop_internal): When -ftime-report-details record
        time spent in sub-timevars.
        (timer::print_row): Adjust.
        (timer::print): Print sub-timevar stats, use all_zero.
        * timevar.def (TV_TODO): Remove.

Index: gcc/passes.c
===================================================================
--- gcc/passes.c        (revision 238469)
+++ gcc/passes.c        (working copy)
@@ -2002,8 +2002,6 @@ execute_todo (unsigned int flags)
       && need_ssa_update_p (cfun))
     gcc_assert (flags & TODO_update_ssa_any);
 
-  timevar_push (TV_TODO);
-
   statistics_fini_pass ();
 
   if (flags)
@@ -2037,8 +2035,6 @@ execute_todo (unsigned int flags)
      df problems.  */
   if (flags & TODO_df_finish)
     df_finish_pass ((flags & TODO_df_verify) != 0);
-
-  timevar_pop (TV_TODO);
 }
 
 /* Verify invariants that should hold between passes.  This is a place
@@ -2190,20 +2186,16 @@ execute_one_ipa_transform_pass (struct c
 
   pass_init_dump_file (pass);
 
-  /* Run pre-pass verification.  */
-  execute_todo (ipa_pass->function_transform_todo_flags_start);
-
   /* If a timevar is present, start it.  */
   if (pass->tv_id != TV_NONE)
     timevar_push (pass->tv_id);
 
+  /* Run pre-pass verification.  */
+  execute_todo (ipa_pass->function_transform_todo_flags_start);
+
   /* Do it!  */
   todo_after = ipa_pass->function_transform (node);
 
-  /* Stop timevar.  */
-  if (pass->tv_id != TV_NONE)
-    timevar_pop (pass->tv_id);
-
   if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
     check_profile_consistency (pass->static_pass_number, 0, true);
 
@@ -2213,6 +2205,10 @@ execute_one_ipa_transform_pass (struct c
   if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
     check_profile_consistency (pass->static_pass_number, 1, true);
 
+  /* Stop timevar.  */
+  if (pass->tv_id != TV_NONE)
+    timevar_pop (pass->tv_id);
+
   if (dump_file)
     do_per_function (execute_function_dump, pass);
   pass_fini_dump_file (pass);
@@ -2321,6 +2317,10 @@ execute_one_pass (opt_pass *pass)
 
   pass_init_dump_file (pass);
 
+  /* If a timevar is present, start it.  */
+  if (pass->tv_id != TV_NONE)
+    timevar_push (pass->tv_id);
+
   /* Run pre-pass verification.  */
   execute_todo (pass->todo_flags_start);
 
@@ -2328,15 +2328,15 @@ execute_one_pass (opt_pass *pass)
     do_per_function (verify_curr_properties,
                     (void *)(size_t)pass->properties_required);
 
-  /* If a timevar is present, start it.  */
-  if (pass->tv_id != TV_NONE)
-    timevar_push (pass->tv_id);
-
   /* Do it!  */
   todo_after = pass->execute (cfun);
 
   if (todo_after & TODO_discard_function)
     {
+      /* Stop timevar.  */
+      if (pass->tv_id != TV_NONE)
+       timevar_pop (pass->tv_id);
+
       pass_fini_dump_file (pass);
 
       gcc_assert (cfun);
@@ -2363,10 +2363,6 @@ execute_one_pass (opt_pass *pass)
 
   do_per_function (clear_last_verified, NULL);
 
-  /* Stop timevar.  */
-  if (pass->tv_id != TV_NONE)
-    timevar_pop (pass->tv_id);
-
   do_per_function (update_properties_after_pass, pass);
 
   if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
@@ -2378,6 +2374,11 @@ execute_one_pass (opt_pass *pass)
     check_profile_consistency (pass->static_pass_number, 1, true);
 
   verify_interpass_invariants ();
+
+  /* Stop timevar.  */
+  if (pass->tv_id != TV_NONE)
+    timevar_pop (pass->tv_id);
+
   if (pass->type == IPA_PASS
       && ((ipa_opt_pass_d *)pass)->function_transform)
     {
Index: gcc/common.opt
===================================================================
--- gcc/common.opt      (revision 238469)
+++ gcc/common.opt      (working copy)
@@ -2255,6 +2255,10 @@ ftime-report
 Common Report Var(time_report)
 Report the time taken by each compiler pass.
 
+ftime-report-details
+Common Report Var(time_report_details)
+Record times taken by sub-phases separately.
+
 ftls-model=
 Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default) 
Init(TLS_MODEL_GLOBAL_DYNAMIC)
 -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec]     Set the 
default thread-local storage code generation model.
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi (revision 238469)
+++ gcc/doc/invoke.texi (working copy)
@@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-report @gol
 -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
 -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
--fstats  -fstack-usage  -ftime-report @gol
+-fstats  -fstack-usage  -ftime-report -ftime-report-details @gol
 -fvar-tracking-assignments-toggle -gtoggle @gol
 -print-file-name=@var{library}  -print-libgcc-file-name @gol
 -print-multi-directory  -print-multi-lib  -print-multi-os-directory @gol
@@ -12543,6 +12543,10 @@ print some statistics about each pass wh
 Makes the compiler print some statistics about the time consumed by each
 pass when it finishes.
 
+@item -ftime-report-details
+@opindex ftime-report-details
+Record the time consumed by infrastructure parts separately for each pass.
+
 @item -fira-verbose=@var{n}
 @opindex fira-verbose
 Control the verbosity of the dump file for the integrated register allocator.
Index: gcc/timevar.c
===================================================================
--- gcc/timevar.c       (revision 238469)
+++ gcc/timevar.c       (working copy)
@@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con
     {
       timer::timevar_def *def = m_hash_map.get (item_name);
       gcc_assert (def);
-      m_timer->print_row (fp, total, def);
+      m_timer->print_row (fp, total, def->name, def->elapsed);
     }
 }
 
@@ -296,6 +296,8 @@ timer::~timer ()
       next = iter->next;
       free (iter);
     }
+  for (unsigned i = 0; i < TIMEVAR_LAST; ++i)
+    delete m_timevars[i].children;
 
   delete m_jit_client_items;
 }
@@ -399,13 +401,26 @@ timer::pop_internal ()
   /* Attribute the elapsed time to the element we're popping.  */
   timevar_accumulate (&popped->timevar->elapsed, &m_start_time, &now);
 
+  /* Take the item off the stack.  */
+  m_stack = m_stack->next;
+
+  /* Record the elapsed sub-time to the parent as well.  */
+  if (m_stack && time_report_details)
+    {
+      if (! m_stack->timevar->children)
+       m_stack->timevar->children = new child_map_t (5);
+      bool existed_p;
+      timevar_time_def &time
+       = m_stack->timevar->children->get_or_insert (popped->timevar, 
&existed_p);
+      if (! existed_p)
+       memset (&time, 0, sizeof (timevar_time_def));
+      timevar_accumulate (&time, &m_start_time, &now);
+    }
+
   /* Reset the start time; from now on, time is attributed to the
      element just exposed on the stack.  */
   m_start_time = now;
 
-  /* Take the item off the stack.  */
-  m_stack = m_stack->next;
-
   /* Don't delete the stack element; instead, add it to the list of
      unused elements for later use.  */
   popped->next = m_unused_stack_instances;
@@ -619,42 +634,54 @@ timer::validate_phases (FILE *fp) const
 void
 timer::print_row (FILE *fp,
                  const timevar_time_def *total,
-                 const timevar_def *tv)
+                 const char *name, const timevar_time_def &elapsed)
 {
   /* The timing variable name.  */
-  fprintf (fp, " %-24s:", tv->name);
+  fprintf (fp, " %-24s:", name);
 
 #ifdef HAVE_USER_TIME
   /* Print user-mode time for this process.  */
   fprintf (fp, "%7.2f (%2.0f%%) usr",
-          tv->elapsed.user,
-          (total->user == 0 ? 0 : tv->elapsed.user / total->user) * 100);
+          elapsed.user,
+          (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
 #endif /* HAVE_USER_TIME */
 
 #ifdef HAVE_SYS_TIME
   /* Print system-mode time for this process.  */
   fprintf (fp, "%7.2f (%2.0f%%) sys",
-          tv->elapsed.sys,
-          (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) * 100);
+          elapsed.sys,
+          (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
 #endif /* HAVE_SYS_TIME */
 
 #ifdef HAVE_WALL_TIME
   /* Print wall clock time elapsed.  */
   fprintf (fp, "%7.2f (%2.0f%%) wall",
-          tv->elapsed.wall,
-          (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) * 100);
+          elapsed.wall,
+          (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
 #endif /* HAVE_WALL_TIME */
 
   /* Print the amount of ggc memory allocated.  */
   fprintf (fp, "%8u kB (%2.0f%%) ggc",
-          (unsigned) (tv->elapsed.ggc_mem >> 10),
+          (unsigned) (elapsed.ggc_mem >> 10),
           (total->ggc_mem == 0
            ? 0
-           : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100);
+           : (float) elapsed.ggc_mem / total->ggc_mem) * 100);
 
   putc ('\n', fp);
 }
 
+/* Return whether ELAPSED is all zero.  */
+
+bool
+timer::all_zero (const timevar_time_def &elapsed)
+{
+  const double tiny = 5e-3;
+  return (elapsed.user < tiny
+         && elapsed.sys < tiny
+         && elapsed.wall < tiny
+         && elapsed.ggc_mem < GGC_MEM_BOUND);
+}
+
 /* Summarize timing variables to FP.  The timing variable TV_TOTAL has
    a special meaning -- it's considered to be the total elapsed time,
    for normalizing the others, and is displayed last.  */
@@ -691,7 +718,6 @@ timer::print (FILE *fp)
   for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
     {
       const timevar_def *tv = &m_timevars[(timevar_id_t) id];
-      const double tiny = 5e-3;
 
       /* Don't print the total execution time here; that goes at the
         end.  */
@@ -702,15 +728,38 @@ timer::print (FILE *fp)
       if (!tv->used)
        continue;
 
+      bool any_children_with_time = false;
+      if (tv->children)
+       for (child_map_t::iterator i = tv->children->begin ();
+            i != tv->children->end (); ++i)
+         if (! all_zero ((*i).second))
+           {
+             any_children_with_time = true;
+             break;
+           }
+
       /* Don't print timing variables if we're going to get a row of
-         zeroes.  */
-      if (tv->elapsed.user < tiny
-         && tv->elapsed.sys < tiny
-         && tv->elapsed.wall < tiny
-         && tv->elapsed.ggc_mem < GGC_MEM_BOUND)
+         zeroes.  Unless there are children with non-zero time.  */
+      if (! any_children_with_time
+         && all_zero (tv->elapsed))
        continue;
 
-      print_row (fp, total, tv);
+      print_row (fp, total, tv->name, tv->elapsed);
+
+      if (tv->children)
+       for (child_map_t::iterator i = tv->children->begin ();
+            i != tv->children->end (); ++i)
+         {
+           timevar_def *tv2 = (*i).first;
+           /* Don't print timing variables if we're going to get a row of
+              zeroes.  */
+           if (! all_zero ((*i).second))
+             {
+               char lname[256];
+               snprintf (lname, 256, "`- %s", tv2->name);
+               print_row (fp, total, lname, (*i).second);
+             }
+         }
     }
   if (m_jit_client_items)
     m_jit_client_items->print (fp, total);
Index: gcc/timevar.h
===================================================================
--- gcc/timevar.h       (revision 238469)
+++ gcc/timevar.h       (working copy)
@@ -131,9 +131,11 @@ class timer
   void pop_internal ();
   static void print_row (FILE *fp,
                         const timevar_time_def *total,
-                        const timevar_def *tv);
+                        const char *name, const timevar_time_def &elapsed);
+  static bool all_zero (const timevar_time_def &elapsed);
 
  private:
+  typedef hash_map<timevar_def *, timevar_time_def> child_map_t;
 
   /* Private type: a timing variable.  */
   struct timevar_def
@@ -155,6 +157,8 @@ class timer
     /* Nonzero if this timing variable was ever started or pushed onto
        the timing stack.  */
     unsigned used : 1;
+
+    child_map_t *children;
   };
 
   /* Private type: an element on the timing stack

Reply via email to