Patch Description:
=================

I am working on a project to do global function layout in the linker where the 
linker reads the callgraph edge profile information, generated by FDO, and uses 
that to find a ordering of functions that will place functions calling each 
other frequently closer, like the Pettis-Hansen code ordering algorithm 
described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.

This patch adds a flag that allows the callgraph edge profile information to be 
stored .note sections called ".note.callgraph.text". The new compiler flag 
-fcallgraph-profiles-sections generates these sections and must be used along 
with -fprofile-use. I have added a PARAM to only output callgraph edges greater 
than a specified threshold. Once this is available, the linker can read these 
sections and generate a global callgraph which can be used to determine a 
global function ordering.

I am adding plugin support in the gold linker to allow linker plugins to be 
able to read the contents of sections and also adding plugin hooks to specify a 
desired ordering of functions to the linker. The linker patch is available here 
: http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is 
available, linker plugins can be used to determine the function layout, like 
the Pettis-Hansen algorithm, of the final binary.

Example: The new .note.callgraph.text sections looks like this for a function 
foo that calls bar 100 times and zap 50 times:
****************************
.section        .note.callgraph.text._Z3foov,"",@progbits
        .string "Function _Z3foov"
        .string "_Z3barv"
        .string "100"
        .string "_Z3zapv"
        .string "50"
***************************

For now, this is for google/main. I will re-submit for review to trunk along 
with data layout.

Google ref 41940

2011-06-07  Sriraman Tallam  <tmsri...@google.com>

        * doc/invoke.texi: document option -fcallgraph-profiles-sections.
        * final.c  (dump_cgraph_profiles): New function.
        (rest_of_handle_final): Create new section '.note.callgraph.text'
        with compiler flag -fcallgraph-profiles-sections
        * common.opt: New option -fcallgraph-profiles-sections.
        * params.def (DEFPARAM): New param
        PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi     (revision 174789)
+++ doc/invoke.texi     (working copy)
@@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
 -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
 -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
 -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
--fcheck-data-deps -fclone-hot-version-paths @gol
+-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
 -fcombine-stack-adjustments -fconserve-stack @gol
 -fcompare-elim -fcprop-registers -fcrossjumping @gol
 -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
@@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
 @opindex fripa-verbose
 Enable printing of verbose information about dynamic inter-procedural 
optimizations.
 This is used in conjunction with the @option{-fripa}.
+
+@item -fcallgraph-profiles-sections
+@opindex fcallgraph-profiles-sections
+Emit call graph edge profile counts in .note.callgraph.text sections. This is
+used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
+section is created for each function. This section lists every callee and the
+number of times it is called. The params variable
+"note-cgraph-section-edge-threshold" can be used to only list edges above a
+certain threshold.
 @end table
 
 The following options control compiler behavior regarding floating
Index: final.c
===================================================================
--- final.c     (revision 174789)
+++ final.c     (working copy)
@@ -4321,13 +4321,37 @@ debug_free_queue (void)
       symbol_queue_size = 0;
     }
 }
-
+
+/* List the call graph profiled edges whise value is greater than
+   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
+   ".note.callgraph.text" section. */
+static void
+dump_cgraph_profiles (void)
+{
+  struct cgraph_node *node = cgraph_node (current_function_decl);
+  struct cgraph_edge *e;
+  struct cgraph_node *callee;
+
+  for (e = node->callees; e != NULL; e = e->next_callee)
+    {
+      if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
+        continue;
+      callee = e->callee;
+      fprintf (asm_out_file, "\t.string \"%s\"\n",
+               IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
+      fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
+               e->count);
+    }
+}
+
 /* Turn the RTL into assembly.  */
 static unsigned int
 rest_of_handle_final (void)
 {
   rtx x;
   const char *fnname;
+  char *profile_fnname;
+  unsigned int flags;
 
   /* Get the function's name, as described by its RTL.  This may be
      different from the DECL_NAME name used in the source file.  */
@@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
     targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
                                decl_fini_priority_lookup
                                  (current_function_decl));
+
+  /* With -fcgraph-section, add ".note.callgraph.text" section for storing
+     profiling information. */
+  if (flag_callgraph_profiles_sections
+      && flag_profile_use
+      && cgraph_node (current_function_decl) != NULL)
+    {
+      flags = SECTION_DEBUG;
+      asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
+      switch_to_section (get_section (profile_fnname, flags, NULL));
+      fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
+      dump_cgraph_profiles ();
+      free (profile_fnname);
+    }
+
   return 0;
 }
 
Index: common.opt
===================================================================
--- common.opt  (revision 174789)
+++ common.opt  (working copy)
@@ -907,6 +907,10 @@ fcaller-saves
 Common Report Var(flag_caller_saves) Optimization
 Save registers around function calls
 
+fcallgraph-profiles-sections
+Common Report Var(flag_callgraph_profiles_sections) Init(0)
+Generate .note.callgraph.text sections listing callees and edge counts.
+
 fcheck-data-deps
 Common Report Var(flag_check_data_deps)
 Compare the results of several data dependence analyzers.
Index: params.def
===================================================================
--- params.def  (revision 174789)
+++ params.def  (working copy)
@@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
          "maximum length of the call graph path to be cloned "
           "while doing multiversioning",
          2, 0, 5)
+
+/* Only output those call graph edges in .note.callgraph.text sections
+   whose count is greater than this value. */
+DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
+         "note-cgraph-section-edge-threshold",
+         "minimum call graph edge count for inclusion in "
+          ".note.callgraph.text section",
+         0, 0, 0)
+
 /*
 Local variables:
 mode:c

--
This patch is available for review at http://codereview.appspot.com/4591045

Reply via email to