The following patch adds the ability to record time spent in utility
to the pass using it.  For things like CFG cleanup, alias stmt walking
or DF infrastructure work it is currently not visible which pass is
responsible for the time spent there.  With -ftime-report-details
you now get

 alias stmt walking      :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 ( 2%) 
wall    1209 kB ( 0%) ggc
...
 tree PRE                :   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 ( 2%) 
wall   11870 kB ( 1%) ggc
 `- tree tail merge      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
wall       0 kB ( 0%) ggc
 `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 ( 0%) 
wall     153 kB ( 0%) ggc
 `- loop init            :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) 
wall       0 kB ( 0%) ggc
 `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
wall       0 kB ( 0%) ggc
 `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
wall     544 kB ( 0%) ggc
 `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall       0 kB ( 0%) ggc
 tree FRE                :   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 ( 2%) 
wall    5658 kB ( 1%) ggc
 `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 ( 2%) 
wall     819 kB ( 0%) ggc
 `- unaccounted todo     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
wall     851 kB ( 0%) ggc
 `- tree operand scan    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
wall       0 kB ( 0%) ggc
 `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall       0 kB ( 0%) ggc
 tree code sinking       :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 ( 0%) 
wall    1268 kB ( 0%) ggc
 `- loop init            :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) 
wall     526 kB ( 0%) ggc
 `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall       0 kB ( 0%) ggc
...

note that the full time spent in, say, alias stmt walking is still
recorded and dumped.  Note this also visualizes current nesting
of timevars and thus you'll get

 rest of compilation     :   1.02 ( 1%) usr   0.04 ( 1%) sys   1.17 ( 1%) 
wall    4977 kB ( 0%) ggc
 `- hard reg cprop       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
wall       0 kB ( 0%) ggc
 `- tree conservative DCE:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall       0 kB ( 0%) ggc
...

because we have passes w/o timevar that are accumulated in
rest of compilation.  There is the opportunity to clean this up.

I've extended the pass timevars to also cover their TODOs and thus
removed TV_TODO.

The accounting is "flat", that is, at most one level of sub-timevars
are recorded.

Does this look sensible and useful given the imperfectness of the
current timevar assignments?

Bootstrap pending.

Thanks,
Richard.

2016-08-10  Richard Biener  <rguent...@suse.de>

        * passes.c (execute_todo): Do not push/pop TV_TODO.
        (execute_one_ipa_transform_pass): Move timevar push/pop TODO execution.
        (execute_one_pass): Likewise.
        * common.opt (ftime-report-details): New switch.
        * doc/invoke.texi (ftime-report-details): Document.
        * timevar.h (timer::print_row): Adjust signature.
        (timer::child_map_t): New typedef.
        (timer::time_var_def): Add children field.
        * timevar.c (timer::named_items::print): Adjust.
        (timer::~timer): Free timevar recorded children.
        (timer::pop_internal): When -ftime-report-details record
        time spent in sub-timevars.
        (timer::print_row): Adjust.
        (timer::print): Print sub-timevar stats.
        * timevar.def (TV_TODO): Remove.

Index: gcc/common.opt
===================================================================
--- gcc/common.opt      (revision 238469)
+++ gcc/common.opt      (working copy)
@@ -2255,6 +2255,10 @@ ftime-report
 Common Report Var(time_report)
 Report the time taken by each compiler pass.
 
+ftime-report-details
+Common Report Var(time_report_details)
+Record times taken by sub-phases separately.
+
 ftls-model=
 Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default) 
Init(TLS_MODEL_GLOBAL_DYNAMIC)
 -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec]     Set the 
default thread-local storage code generation model.
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi (revision 238469)
+++ gcc/doc/invoke.texi (working copy)
@@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-report @gol
 -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
 -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
--fstats  -fstack-usage  -ftime-report @gol
+-fstats  -fstack-usage  -ftime-report -ftime-report-details @gol
 -fvar-tracking-assignments-toggle -gtoggle @gol
 -print-file-name=@var{library}  -print-libgcc-file-name @gol
 -print-multi-directory  -print-multi-lib  -print-multi-os-directory @gol
@@ -12543,6 +12543,10 @@ print some statistics about each pass wh
 Makes the compiler print some statistics about the time consumed by each
 pass when it finishes.
 
+@item -ftime-report
+@opindex ftime-report
+Record the time consumed by infrastructure parts separately for each pass.
+
 @item -fira-verbose=@var{n}
 @opindex fira-verbose
 Control the verbosity of the dump file for the integrated register allocator.
Index: gcc/timevar.c
===================================================================
--- gcc/timevar.c       (revision 238469)
+++ gcc/timevar.c       (working copy)
@@ -205,7 +205,7 @@ timer::named_items::print (FILE *fp, con
     {
       timer::timevar_def *def = m_hash_map.get (item_name);
       gcc_assert (def);
-      m_timer->print_row (fp, total, def);
+      m_timer->print_row (fp, total, def->name, def->elapsed);
     }
 }
 
@@ -296,6 +296,9 @@ timer::~timer ()
       next = iter->next;
       free (iter);
     }
+  for (unsigned i = 0; i < TIMEVAR_LAST; ++i)
+    if (m_timevars[i].children)
+      delete m_timevars[i].children;
 
   delete m_jit_client_items;
 }
@@ -399,13 +402,26 @@ timer::pop_internal ()
   /* Attribute the elapsed time to the element we're popping.  */
   timevar_accumulate (&popped->timevar->elapsed, &m_start_time, &now);
 
+  /* Take the item off the stack.  */
+  m_stack = m_stack->next;
+
+  /* Record the elapsed sub-time to the parent as well.  */
+  if (m_stack && time_report_details)
+    {
+      if (! m_stack->timevar->children)
+       m_stack->timevar->children = new child_map_t (5);
+      bool existed_p;
+      timevar_time_def &time
+       = m_stack->timevar->children->get_or_insert (popped->timevar, 
&existed_p);
+      if (! existed_p)
+       memset (&time, 0, sizeof (timevar_time_def));
+      timevar_accumulate (&time, &m_start_time, &now);
+    }
+
   /* Reset the start time; from now on, time is attributed to the
      element just exposed on the stack.  */
   m_start_time = now;
 
-  /* Take the item off the stack.  */
-  m_stack = m_stack->next;
-
   /* Don't delete the stack element; instead, add it to the list of
      unused elements for later use.  */
   popped->next = m_unused_stack_instances;
@@ -619,38 +635,38 @@ timer::validate_phases (FILE *fp) const
 void
 timer::print_row (FILE *fp,
                  const timevar_time_def *total,
-                 const timevar_def *tv)
+                 const char *name, const timevar_time_def &elapsed)
 {
   /* The timing variable name.  */
-  fprintf (fp, " %-24s:", tv->name);
+  fprintf (fp, " %-24s:", name);
 
 #ifdef HAVE_USER_TIME
   /* Print user-mode time for this process.  */
   fprintf (fp, "%7.2f (%2.0f%%) usr",
-          tv->elapsed.user,
-          (total->user == 0 ? 0 : tv->elapsed.user / total->user) * 100);
+          elapsed.user,
+          (total->user == 0 ? 0 : elapsed.user / total->user) * 100);
 #endif /* HAVE_USER_TIME */
 
 #ifdef HAVE_SYS_TIME
   /* Print system-mode time for this process.  */
   fprintf (fp, "%7.2f (%2.0f%%) sys",
-          tv->elapsed.sys,
-          (total->sys == 0 ? 0 : tv->elapsed.sys / total->sys) * 100);
+          elapsed.sys,
+          (total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
 #endif /* HAVE_SYS_TIME */
 
 #ifdef HAVE_WALL_TIME
   /* Print wall clock time elapsed.  */
   fprintf (fp, "%7.2f (%2.0f%%) wall",
-          tv->elapsed.wall,
-          (total->wall == 0 ? 0 : tv->elapsed.wall / total->wall) * 100);
+          elapsed.wall,
+          (total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
 #endif /* HAVE_WALL_TIME */
 
   /* Print the amount of ggc memory allocated.  */
   fprintf (fp, "%8u kB (%2.0f%%) ggc",
-          (unsigned) (tv->elapsed.ggc_mem >> 10),
+          (unsigned) (elapsed.ggc_mem >> 10),
           (total->ggc_mem == 0
            ? 0
-           : (float) tv->elapsed.ggc_mem / total->ggc_mem) * 100);
+           : (float) elapsed.ggc_mem / total->ggc_mem) * 100);
 
   putc ('\n', fp);
 }
@@ -702,15 +718,47 @@ timer::print (FILE *fp)
       if (!tv->used)
        continue;
 
+      bool any_childs_with_time = false;
+      if (tv->children)
+       for (child_map_t::iterator i = tv->children->begin ();
+            i != tv->children->end (); ++i)
+         if (!((*i).second.user < tiny
+               && (*i).second.sys < tiny
+               && (*i).second.wall < tiny
+               && (*i).second.ggc_mem < GGC_MEM_BOUND))
+           {
+             any_childs_with_time = true;
+             break;
+           }
+
       /* Don't print timing variables if we're going to get a row of
-         zeroes.  */
-      if (tv->elapsed.user < tiny
+         zeroes.  Unless there are children with non-zero time.  */
+      if (! any_childs_with_time
+         && tv->elapsed.user < tiny
          && tv->elapsed.sys < tiny
          && tv->elapsed.wall < tiny
          && tv->elapsed.ggc_mem < GGC_MEM_BOUND)
        continue;
 
-      print_row (fp, total, tv);
+      print_row (fp, total, tv->name, tv->elapsed);
+
+      if (tv->children)
+       for (child_map_t::iterator i = tv->children->begin ();
+            i != tv->children->end (); ++i)
+         {
+           timevar_def *tv2 = (*i).first;
+           /* Don't print timing variables if we're going to get a row of
+              zeroes.  */
+           if (!((*i).second.user < tiny
+                 && (*i).second.sys < tiny
+                 && (*i).second.wall < tiny
+                 && (*i).second.ggc_mem < GGC_MEM_BOUND))
+             {
+               char lname[256];
+               sprintf (lname, "`- %s", tv2->name);
+               print_row (fp, total, lname, (*i).second);
+             }
+         }
     }
   if (m_jit_client_items)
     m_jit_client_items->print (fp, total);
Index: gcc/timevar.h
===================================================================
--- gcc/timevar.h       (revision 238469)
+++ gcc/timevar.h       (working copy)
@@ -131,9 +131,10 @@ class timer
   void pop_internal ();
   static void print_row (FILE *fp,
                         const timevar_time_def *total,
-                        const timevar_def *tv);
+                        const char *name, const timevar_time_def &elapsed);
 
  private:
+  typedef hash_map<timevar_def *, timevar_time_def> child_map_t;
 
   /* Private type: a timing variable.  */
   struct timevar_def
@@ -155,6 +156,8 @@ class timer
     /* Nonzero if this timing variable was ever started or pushed onto
        the timing stack.  */
     unsigned used : 1;
+
+    child_map_t *children;
   };
 
   /* Private type: an element on the timing stack
Index: gcc/passes.c
===================================================================
*** gcc/passes.c        (revision 238469)
--- gcc/passes.c        (working copy)
*************** execute_todo (unsigned int flags)
*** 2002,2009 ****
        && need_ssa_update_p (cfun))
      gcc_assert (flags & TODO_update_ssa_any);
  
-   timevar_push (TV_TODO);
- 
    statistics_fini_pass ();
  
    if (flags)
--- 2002,2007 ----
*************** execute_todo (unsigned int flags)
*** 2037,2044 ****
       df problems.  */
    if (flags & TODO_df_finish)
      df_finish_pass ((flags & TODO_df_verify) != 0);
- 
-   timevar_pop (TV_TODO);
  }
  
  /* Verify invariants that should hold between passes.  This is a place
--- 2035,2040 ----
*************** execute_one_ipa_transform_pass (struct c
*** 2190,2209 ****
  
    pass_init_dump_file (pass);
  
-   /* Run pre-pass verification.  */
-   execute_todo (ipa_pass->function_transform_todo_flags_start);
- 
    /* If a timevar is present, start it.  */
    if (pass->tv_id != TV_NONE)
      timevar_push (pass->tv_id);
  
    /* Do it!  */
    todo_after = ipa_pass->function_transform (node);
  
-   /* Stop timevar.  */
-   if (pass->tv_id != TV_NONE)
-     timevar_pop (pass->tv_id);
- 
    if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
      check_profile_consistency (pass->static_pass_number, 0, true);
  
--- 2186,2201 ----
  
    pass_init_dump_file (pass);
  
    /* If a timevar is present, start it.  */
    if (pass->tv_id != TV_NONE)
      timevar_push (pass->tv_id);
  
+   /* Run pre-pass verification.  */
+   execute_todo (ipa_pass->function_transform_todo_flags_start);
+ 
    /* Do it!  */
    todo_after = ipa_pass->function_transform (node);
  
    if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
      check_profile_consistency (pass->static_pass_number, 0, true);
  
*************** execute_one_ipa_transform_pass (struct c
*** 2213,2218 ****
--- 2205,2214 ----
    if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
      check_profile_consistency (pass->static_pass_number, 1, true);
  
+   /* Stop timevar.  */
+   if (pass->tv_id != TV_NONE)
+     timevar_pop (pass->tv_id);
+ 
    if (dump_file)
      do_per_function (execute_function_dump, pass);
    pass_fini_dump_file (pass);
*************** execute_one_pass (opt_pass *pass)
*** 2321,2326 ****
--- 2317,2326 ----
  
    pass_init_dump_file (pass);
  
+   /* If a timevar is present, start it.  */
+   if (pass->tv_id != TV_NONE)
+     timevar_push (pass->tv_id);
+ 
    /* Run pre-pass verification.  */
    execute_todo (pass->todo_flags_start);
  
*************** execute_one_pass (opt_pass *pass)
*** 2328,2342 ****
      do_per_function (verify_curr_properties,
                     (void *)(size_t)pass->properties_required);
  
-   /* If a timevar is present, start it.  */
-   if (pass->tv_id != TV_NONE)
-     timevar_push (pass->tv_id);
- 
    /* Do it!  */
    todo_after = pass->execute (cfun);
  
    if (todo_after & TODO_discard_function)
      {
        pass_fini_dump_file (pass);
  
        gcc_assert (cfun);
--- 2328,2342 ----
      do_per_function (verify_curr_properties,
                     (void *)(size_t)pass->properties_required);
  
    /* Do it!  */
    todo_after = pass->execute (cfun);
  
    if (todo_after & TODO_discard_function)
      {
+       /* Stop timevar.  */
+       if (pass->tv_id != TV_NONE)
+       timevar_pop (pass->tv_id);
+ 
        pass_fini_dump_file (pass);
  
        gcc_assert (cfun);
*************** execute_one_pass (opt_pass *pass)
*** 2363,2372 ****
  
    do_per_function (clear_last_verified, NULL);
  
-   /* Stop timevar.  */
-   if (pass->tv_id != TV_NONE)
-     timevar_pop (pass->tv_id);
- 
    do_per_function (update_properties_after_pass, pass);
  
    if (profile_report && cfun && (cfun->curr_properties & PROP_cfg))
--- 2363,2368 ----
*************** execute_one_pass (opt_pass *pass)
*** 2378,2383 ****
--- 2374,2384 ----
      check_profile_consistency (pass->static_pass_number, 1, true);
  
    verify_interpass_invariants ();
+ 
+   /* Stop timevar.  */
+   if (pass->tv_id != TV_NONE)
+     timevar_pop (pass->tv_id);
+ 
    if (pass->type == IPA_PASS
        && ((ipa_opt_pass_d *)pass)->function_transform)
      {
Index: gcc/timevar.def
===================================================================
*** gcc/timevar.def     (revision 238469)
--- gcc/timevar.def     (working copy)
*************** DEFTIMEVAR (TV_POSTRELOAD            , "unacco
*** 290,296 ****
  DEFTIMEVAR (TV_LATE_COMPILATION            , "unaccounted late compilation")
  DEFTIMEVAR (TV_REMOVE_UNUSED       , "remove unused locals")
  DEFTIMEVAR (TV_ADDRESS_TAKEN       , "address taken")
- DEFTIMEVAR (TV_TODO                , "unaccounted todo")
  DEFTIMEVAR (TV_VERIFY_LOOP_CLOSED    , "verify loop closed")
  DEFTIMEVAR (TV_VERIFY_RTL_SHARING    , "verify RTL sharing")
  DEFTIMEVAR (TV_REBUILD_FREQUENCIES   , "rebuild frequencies")
--- 290,295 ----

Reply via email to