> -----Original Message-----
> From: Prathamesh Kulkarni <[email protected]>
> Sent: 06 October 2025 19:41
> To: [email protected]; Jan Hubicka <[email protected]>
> Subject: [RFC] Enable time profile function reordering with AutoFDO
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Honza,
> The attached patch enables time profile based reordering with AutoFDO
> with -fauto-profile -fprofile-reorder-functions, by mapping timestamps
> obtained from perf into node->tp_first_run, and is based on top of
> Dhruv's sourcefile tracking patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2025-September/694800.html
> 
> The rationale for doing this is:
> (1) GCC already implements time-profile function reordering, the patch
> enables it with AutoFDO.
> (2) While time profile ordering is primarily meant for optimizing
> startup time, we've also observed good effects on code-locality for
> large internal workloads.
> (3) Possibly useful for function reordering when accurate profile
> annotation is hard with AutoFDO -- For eg, if branch samples are
> missing (due to absence of LBR like structure).
> 
> On AutoFDO tools side, I have a patch that extends gcov to emit 64-bit
> perf timestamp that records first execution of function, which loosely
> corresponds to PGO's time_profile counter.
> The timestamp is stored adjacent to head field in toplevel function
> info.
> I will post a patch for this shortly on AutoFDO tools upstream repo.
> 
> On GCC side, the patch makes the following changes:
> 
> (1) Changes to auto-profile pass:
> The patch adds a new field timestamp to function_instance, and
> populates it in read_function_instance.
> 
> It maintains a new timestamp_info_map from timestamp -> <name,
> tp_first_run>, which maps timestamps sorted in ascending order to
> (1..N), so lowest ordered timestamp is mapped to 1 and so on. The
> rationale for this is that timestamps are 64-bit integers, and we
> don't need the full 64-bit range for ordering by tp_first_run.
> 
> During annotation, the timestamp associated with function_instance is
> looked up in timestamp_info_map, and corresponding mapped value is
> assigned to node->tp_first_run.
> 
> (2) Handling clones:
> Currently, for clones not registered in call graph before auto-profile
> pass, the tp_first_run field is copied from original function, when
> the clone is created.
> However that may not correspond to the actual order of functions.
> 
> For eg, if we have two profiled clones of foo:
> foo.constprop.1, foo.constprop.2
> 
> both will get same value for tp_first_run as foo->tp_first_run, which
> might not correspond to time profile order.
> 
> To address this, the patch introduces a new IPA pass
> ipa_adjust_tp_first_run, that streams <clone name, tp_first_run> from
> timestamp_info_map during LGEN, and during WPA reads it, and sets
> clone's tp_first_run field accordingly.
> The pass is placed pretty late (just before locality_cloning), by that
> point clones would be registered in the call graph.
> 
> Dhruv's sourcefile tracking patch already handles LTO privatized
> functions.
> The patch adds a (temporary) workaround for functions with
> mismatched/empty filenames from gcov, to avoid getting dropped in
> afdo_annotate_cfg by iterating thru all filenames in afdo_string_table
> if get_function_instance_by_decl fails to find function_instance with
> lbasename (DECL_SOURCE_FILE (decl)).
> 
> (3) Grouping profiled functions together in as few partitions as
> possible (preferably single).
> The patch places profiled functions in time profile order together in
> as few paritions as possible to get better advantage of code locality.
> Unlike PGO, where every instrumented function gets a time profile
> counter, with AutoFDO, the sampled functions are a fraction of the
> total executed ones.
> Similarly, in default_function_section, it overrides hot/cold
> partitioning so that grouping of profiled functions isn't disrupted.
> 
> (4) Option to disable profile driven opts.
> The patch adds option -fauto-profile-reorder-only which only enables
> time-profile reordering with AutoFDO (and disables profile driven
> opts):
> (a) Useful as a debugging aid to isolate regression to either function
> reordering or profile driven opts.
> (b) For our use case, it's also seemingly useful as a stopgap measure
> to avoid regressions with AutoFDO profile driven opts, due to issues
> with profile quality obtained with merging of SPE and non SPE
> profiles.
> We're actively working on resolving this.
> (c) Possibly useful for architectures which do not support branch
> sampling.
> The option is disabled by default.
> 
> Ideally, I would like to make it a param (and not user facing option),
> but I am not able to control enabling/disabling options in
> opts.cc:common_handle_option based on param value, will investigate
> this further.
> 
> * Results
> 
> On one large interal workload, the patch (along with sourcefile
> tracking patch), gives an uplift of 32.63% compared to LTO, and 8.07%
> compared to LTO + AutoFDO trunk, and for another workload it gives an
> uplift of 15.31% compared to LTO, and 7.76% compared to LTO + AutoFDO
> trunk.
> I will try benchmarking with SPEC2017.
> 
> Will be grateful for suggestions on how to proceed further.
Hi,
ping: https://gcc.gnu.org/pipermail/gcc-patches/2025-October/696758.html

Thanks,
Prathamesh
> 
> Signed-off-by: Prathamesh Kulkarni <[email protected]>
> 
> Thanks,
> Prathamesh

Reply via email to