ok for google/main. David
On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsri...@google.com> wrote: > +davidxl > > On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsri...@google.com> wrote: >> Patch Description: >> ================= >> >> I am working on a project to do global function layout in the linker where >> the linker reads the callgraph edge profile information, generated by FDO, >> and uses that to find a ordering of functions that will place functions >> calling each other frequently closer, like the Pettis-Hansen code ordering >> algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI >> 1990. >> >> This patch adds a flag that allows the callgraph edge profile information to >> be stored .note sections called ".note.callgraph.text". The new compiler >> flag -fcallgraph-profiles-sections generates these sections and must be used >> along with -fprofile-use. I have added a PARAM to only output callgraph >> edges greater than a specified threshold. Once this is available, the linker >> can read these sections and generate a global callgraph which can be used to >> determine a global function ordering. >> >> I am adding plugin support in the gold linker to allow linker plugins to be >> able to read the contents of sections and also adding plugin hooks to >> specify a desired ordering of functions to the linker. The linker patch is >> available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. >> Once this is available, linker plugins can be used to determine the function >> layout, like the Pettis-Hansen algorithm, of the final binary. >> >> Example: The new .note.callgraph.text sections looks like this for a >> function foo that calls bar 100 times and zap 50 times: >> **************************** >> .section .note.callgraph.text._Z3foov,"",@progbits >> .string "Function _Z3foov" >> .string "_Z3barv" >> .string "100" >> .string "_Z3zapv" >> .string "50" >> *************************** >> >> For now, this is for google/main. I will re-submit for review to trunk along >> with data layout. >> >> Google ref 41940 >> >> 2011-06-07 Sriraman Tallam <tmsri...@google.com> >> >> * doc/invoke.texi: document option -fcallgraph-profiles-sections. >> * final.c (dump_cgraph_profiles): New function. >> (rest_of_handle_final): Create new section '.note.callgraph.text' >> with compiler flag -fcallgraph-profiles-sections >> * common.opt: New option -fcallgraph-profiles-sections. >> * params.def (DEFPARAM): New param >> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. >> >> Index: doc/invoke.texi >> =================================================================== >> --- doc/invoke.texi (revision 174789) >> +++ doc/invoke.texi (working copy) >> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. >> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol >> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol >> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol >> --fcheck-data-deps -fclone-hot-version-paths @gol >> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths >> @gol >> -fcombine-stack-adjustments -fconserve-stack @gol >> -fcompare-elim -fcprop-registers -fcrossjumping @gol >> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol >> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline >> @opindex fripa-verbose >> Enable printing of verbose information about dynamic inter-procedural >> optimizations. >> This is used in conjunction with the @option{-fripa}. >> + >> +@item -fcallgraph-profiles-sections >> +@opindex fcallgraph-profiles-sections >> +Emit call graph edge profile counts in .note.callgraph.text sections. This >> is >> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text >> +section is created for each function. This section lists every callee and >> the >> +number of times it is called. The params variable >> +"note-cgraph-section-edge-threshold" can be used to only list edges above a >> +certain threshold. >> @end table >> >> The following options control compiler behavior regarding floating >> Index: final.c >> =================================================================== >> --- final.c (revision 174789) >> +++ final.c (working copy) >> @@ -4321,13 +4321,37 @@ debug_free_queue (void) >> symbol_queue_size = 0; >> } >> } >> - >> + >> +/* List the call graph profiled edges whise value is greater than >> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the >> + ".note.callgraph.text" section. */ >> +static void >> +dump_cgraph_profiles (void) >> +{ >> + struct cgraph_node *node = cgraph_node (current_function_decl); >> + struct cgraph_edge *e; >> + struct cgraph_node *callee; >> + >> + for (e = node->callees; e != NULL; e = e->next_callee) >> + { >> + if (e->count <= PARAM_VALUE >> (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) >> + continue; >> + callee = e->callee; >> + fprintf (asm_out_file, "\t.string \"%s\"\n", >> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); >> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC >> "\"\n", >> + e->count); >> + } >> +} >> + >> /* Turn the RTL into assembly. */ >> static unsigned int >> rest_of_handle_final (void) >> { >> rtx x; >> const char *fnname; >> + char *profile_fnname; >> + unsigned int flags; >> >> /* Get the function's name, as described by its RTL. This may be >> different from the DECL_NAME name used in the source file. */ >> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) >> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), >> decl_fini_priority_lookup >> (current_function_decl)); >> + >> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing >> + profiling information. */ >> + if (flag_callgraph_profiles_sections >> + && flag_profile_use >> + && cgraph_node (current_function_decl) != NULL) >> + { >> + flags = SECTION_DEBUG; >> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); >> + switch_to_section (get_section (profile_fnname, flags, NULL)); >> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); >> + dump_cgraph_profiles (); >> + free (profile_fnname); >> + } >> + >> return 0; >> } >> >> Index: common.opt >> =================================================================== >> --- common.opt (revision 174789) >> +++ common.opt (working copy) >> @@ -907,6 +907,10 @@ fcaller-saves >> Common Report Var(flag_caller_saves) Optimization >> Save registers around function calls >> >> +fcallgraph-profiles-sections >> +Common Report Var(flag_callgraph_profiles_sections) Init(0) >> +Generate .note.callgraph.text sections listing callees and edge counts. >> + >> fcheck-data-deps >> Common Report Var(flag_check_data_deps) >> Compare the results of several data dependence analyzers. >> Index: params.def >> =================================================================== >> --- params.def (revision 174789) >> +++ params.def (working copy) >> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, >> "maximum length of the call graph path to be cloned " >> "while doing multiversioning", >> 2, 0, 5) >> + >> +/* Only output those call graph edges in .note.callgraph.text sections >> + whose count is greater than this value. */ >> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, >> + "note-cgraph-section-edge-threshold", >> + "minimum call graph edge count for inclusion in " >> + ".note.callgraph.text section", >> + 0, 0, 0) >> + >> /* >> Local variables: >> mode:c >> >> -- >> This patch is available for review at http://codereview.appspot.com/4591045 >> >