> On 1/21/21 3:01 PM, Jan Hubicka wrote:
> > > 
> > > Plus I'm planning to send one more patch that will ignore time profile 
> > > when -fprofile-reproduce != serial.
> > 
> > Why you need to disable time profiling?
> 
> Because you can have 2 training runs (running in parallel) when order is:
> runA: foo -> bar
> runB: bar -> foo
> 
> Then based on order of profile merging you get a final output.

For this reason we merge by computing average, which is stable over
reordering the indices....

Honza
> 
> I would like to address it with the attached patch.
> 
> Martin
> 
> > 
> > Honza
> > 
> 

> From fb4bc6f4b4b106d38fbf710f87e128d26fc1b988 Mon Sep 17 00:00:00 2001
> From: Martin Liska <mli...@suse.cz>
> Date: Thu, 21 Jan 2021 09:22:45 +0100
> Subject: [PATCH 2/2] Consider time profilers only when
>  -fprofile-reproducible=serial.
> 
> gcc/ChangeLog:
> 
>       PR gcov-profile/98739
>       * cgraphunit.c (expand_all_functions): Consider tp_first_run
>       only when -fprofile-reproducible=serial.
> 
> gcc/lto/ChangeLog:
> 
>       PR gcov-profile/98739
>       * lto-partition.c (lto_balanced_map): Consider tp_first_run
>       only when -fprofile-reproducible=serial.
> ---
>  gcc/cgraphunit.c        | 5 +++--
>  gcc/lto/lto-partition.c | 3 ++-
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index b401f0817a3..042c03d819e 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -1961,8 +1961,9 @@ expand_all_functions (void)
>        }
>  
>    /* First output functions with time profile in specified order.  */
> -  qsort (tp_first_run_order, tp_first_run_order_pos,
> -      sizeof (cgraph_node *), tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +    qsort (tp_first_run_order, tp_first_run_order_pos,
> +        sizeof (cgraph_node *), tp_first_run_node_cmp);
>    for (i = 0; i < tp_first_run_order_pos; i++)
>      {
>        node = tp_first_run_order[i];
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 15761ac9eb5..f9e632776e6 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -509,7 +509,8 @@ lto_balanced_map (int n_lto_partitions, int 
> max_partition_size)
>       unit tends to import a lot of global trees defined there.  We should
>       get better about minimizing the function bounday, but until that
>       things works smoother if we order in source order.  */
> -  order.qsort (tp_first_run_node_cmp);
> +  if (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_SERIAL)
> +    order.qsort (tp_first_run_node_cmp);
>    noreorder.qsort (node_cmp);
>  
>    if (dump_file)
> -- 
> 2.30.0
> 

Reply via email to