Patch Description: ================= I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: **************************** .section .note.callgraph.text._Z3foov,"",@progbits .string "Function _Z3foov" .string "_Z3barv" .string "100" .string "_Z3zapv" .string "50" *************************** For now, this is for google/main. I will re-submit for review to trunk along with data layout. Google ref 41940 2011-06-07 Sriraman Tallam <tmsri...@google.com> * doc/invoke.texi: document option -fcallgraph-profiles-sections. * final.c (dump_cgraph_profiles): New function. (rest_of_handle_final): Create new section '.note.callgraph.text' with compiler flag -fcallgraph-profiles-sections * common.opt: New option -fcallgraph-profiles-sections. * params.def (DEFPARAM): New param PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 174789) +++ doc/invoke.texi (working copy) @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol --fcheck-data-deps -fclone-hot-version-paths @gol +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol -fcombine-stack-adjustments -fconserve-stack @gol -fcompare-elim -fcprop-registers -fcrossjumping @gol -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline @opindex fripa-verbose Enable printing of verbose information about dynamic inter-procedural optimizations. This is used in conjunction with the @option{-fripa}. + +@item -fcallgraph-profiles-sections +@opindex fcallgraph-profiles-sections +Emit call graph edge profile counts in .note.callgraph.text sections. This is +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text +section is created for each function. This section lists every callee and the +number of times it is called. The params variable +"note-cgraph-section-edge-threshold" can be used to only list edges above a +certain threshold. @end table The following options control compiler behavior regarding floating Index: final.c =================================================================== --- final.c (revision 174789) +++ final.c (working copy) @@ -4321,13 +4321,37 @@ debug_free_queue (void) symbol_queue_size = 0; } } - + +/* List the call graph profiled edges whise value is greater than + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the + ".note.callgraph.text" section. */ +static void +dump_cgraph_profiles (void) +{ + struct cgraph_node *node = cgraph_node (current_function_decl); + struct cgraph_edge *e; + struct cgraph_node *callee; + + for (e = node->callees; e != NULL; e = e->next_callee) + { + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) + continue; + callee = e->callee; + fprintf (asm_out_file, "\t.string \"%s\"\n", + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n", + e->count); + } +} + /* Turn the RTL into assembly. */ static unsigned int rest_of_handle_final (void) { rtx x; const char *fnname; + char *profile_fnname; + unsigned int flags; /* Get the function's name, as described by its RTL. This may be different from the DECL_NAME name used in the source file. */ @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), decl_fini_priority_lookup (current_function_decl)); + + /* With -fcgraph-section, add ".note.callgraph.text" section for storing + profiling information. */ + if (flag_callgraph_profiles_sections + && flag_profile_use + && cgraph_node (current_function_decl) != NULL) + { + flags = SECTION_DEBUG; + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); + switch_to_section (get_section (profile_fnname, flags, NULL)); + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); + dump_cgraph_profiles (); + free (profile_fnname); + } + return 0; } Index: common.opt =================================================================== --- common.opt (revision 174789) +++ common.opt (working copy) @@ -907,6 +907,10 @@ fcaller-saves Common Report Var(flag_caller_saves) Optimization Save registers around function calls +fcallgraph-profiles-sections +Common Report Var(flag_callgraph_profiles_sections) Init(0) +Generate .note.callgraph.text sections listing callees and edge counts. + fcheck-data-deps Common Report Var(flag_check_data_deps) Compare the results of several data dependence analyzers. Index: params.def =================================================================== --- params.def (revision 174789) +++ params.def (working copy) @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, "maximum length of the call graph path to be cloned " "while doing multiversioning", 2, 0, 5) + +/* Only output those call graph edges in .note.callgraph.text sections + whose count is greater than this value. */ +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, + "note-cgraph-section-edge-threshold", + "minimum call graph edge count for inclusion in " + ".note.callgraph.text section", + 0, 0, 0) + /* Local variables: mode:c -- This patch is available for review at http://codereview.appspot.com/4591045