https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118508
Bug ID: 118508 Summary: 10% performance drop when enabling autofdo for spec2017 554.roms_r Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- with -march=x86-64-v3 -O2. part of dump_gcov is like __step3d_t_mod_MOD_step3d_t total:5500129 head:0 0: 0 29: 0 30: 0 31: 0 32: 0 36: 0 37: 0 38: 0 39: 0 46: 0 59: 0 60: 0 62: 0 59: step3d_t_tile total:5500129 4: 0 4.2: 0 4.4: 0 4.6: 0 4.8: 0 5: 0 5.2: 0 5.4: 0 5.6: 0 5.8: 0 5.10: 0 5.12: 0 7: 0 7.2: 0 7.4: 1 8: 0 8.2: 0 step3d_t_tile is local and only called by step3d_t. Autofdo will do early inline if the edge in the call graph is hot, and it will check total count from the callsite. Unfortranately, the string name it used is DECL_ASSEMBLER_NAME (edge->callee->decl)) which is __step3d_t_mod_MOD_step3d_t_tile, but corresponding name in afdo string table is step3d_t_tile(w/o prefix, I guess it's from debug string table). The mismatch cause auto lost profiling info for step3d_t_tile and thought it was cold and optimized for size. A hack like below can recover performance and further improve 554.roms_r by 3% with autofdo diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 3f890e6d1e6..ae8dd9bfdaf 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -837,8 +837,10 @@ autofdo_source_profile::get_callsite_total_count ( function_instance *s = get_function_instance_by_inline_stack (stack); if (s == NULL - || afdo_string_table->get_index (IDENTIFIER_POINTER ( - DECL_ASSEMBLER_NAME (edge->callee->decl))) != s->name ()) + || (afdo_string_table->get_index (IDENTIFIER_POINTER ( + DECL_ASSEMBLER_NAME (edge->callee->decl))) != s->name () + && afdo_string_table->get_index_by_decl (edge->callee->decl) + != s->name())) return 0; return s->total_count ();