> On 1/21/21 3:01 PM, Jan Hubicka wrote: > > > > > > Plus I'm planning to send one more patch that will ignore time profile > > > when -fprofile-reproduce != serial. > > > > Why you need to disable time profiling? > > Because you can have 2 training runs (running in parallel) when order is: > runA: foo -> bar > runB: bar -> foo > > Then based on order of profile merging you get a final output.
For this reason we merge by computing average, which is stable over reordering the indices.... Honza > > I would like to address it with the attached patch. > > Martin > > > > > Honza > > > > From fb4bc6f4b4b106d38fbf710f87e128d26fc1b988 Mon Sep 17 00:00:00 2001 > From: Martin Liska <mli...@suse.cz> > Date: Thu, 21 Jan 2021 09:22:45 +0100 > Subject: [PATCH 2/2] Consider time profilers only when > -fprofile-reproducible=serial. > > gcc/ChangeLog: > > PR gcov-profile/98739 > * cgraphunit.c (expand_all_functions): Consider tp_first_run > only when -fprofile-reproducible=serial. > > gcc/lto/ChangeLog: > > PR gcov-profile/98739 > * lto-partition.c (lto_balanced_map): Consider tp_first_run > only when -fprofile-reproducible=serial. > --- > gcc/cgraphunit.c | 5 +++-- > gcc/lto/lto-partition.c | 3 ++- > 2 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c > index b401f0817a3..042c03d819e 100644 > --- a/gcc/cgraphunit.c > +++ b/gcc/cgraphunit.c > @@ -1961,8 +1961,9 @@ expand_all_functions (void) > } > > /* First output functions with time profile in specified order. */ > - qsort (tp_first_run_order, tp_first_run_order_pos, > - sizeof (cgraph_node *), tp_first_run_node_cmp); > + if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL) > + qsort (tp_first_run_order, tp_first_run_order_pos, > + sizeof (cgraph_node *), tp_first_run_node_cmp); > for (i = 0; i < tp_first_run_order_pos; i++) > { > node = tp_first_run_order[i]; > diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c > index 15761ac9eb5..f9e632776e6 100644 > --- a/gcc/lto/lto-partition.c > +++ b/gcc/lto/lto-partition.c > @@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int > max_partition_size) > unit tends to import a lot of global trees defined there. We should > get better about minimizing the function bounday, but until that > things works smoother if we order in source order. */ > - order.qsort (tp_first_run_node_cmp); > + if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL) > + order.qsort (tp_first_run_node_cmp); > noreorder.qsort (node_cmp); > > if (dump_file) > -- > 2.30.0 >