Re: [Patch] Handle profile counts truncated to 0 in coldness test

Jan Hubicka Fri, 18 Oct 2013 07:26:47 -0700

> 
> As noted above, the change to avoid the RDIV by runs decreases the
> effect of the min_run_ratio (now unlikely_count_fraction) in half. So
> we really need to increase this parameter to 32 to compare to what the
> old version of the code was doing.
> 
> Indeed, using 32 does fix the same set of issues. However, when
> setting the parameter to the old default of 4, there are 5 core dumps
> using the patch to make the unlikely code unexecutable with a 100
> training runs (see below for some examples why). This goes down to 2
> failures if I set the parameter to 10, and 0 if I set it to 20. Using
> 20 essentially means that the code has to be executed 5% of the runs,
> which seems reasonable for something like splitting. What do you
> think?


Yes, I think I am fine with pretty much any constant that is not insanely large.
20 or 32 seems fine.
> 
> 1) No context sensitivity for routine that is inlined:
> 
> In 
> /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr33870.c,
> we call merge_pagelist 3 times from sort_pagelist. In the profile-use
> compile merge_pagelist is inlined (I believe in all 3 locations). One
> of the conditional blocks (line 23) has a profile count that is less
> than runs/4. It turns out that from the first merge_pagelist callsite
> we always execute this block, but the other two are completely cold.
> But the profile info from all three calls is of course merged, and it
> looks like it is executed infrequently overall. So the block is split
> but then we execute it from the inline instance corresponding to the
> first call.
> 
> I'm not sure what we can do in general here. But it is another data
> point suggesting to me that the blocks should be very cold to be
> split.

Yep, these issues from code duplicating optimizations are probably going to be
unavoidable.
> 
>  2) RTL expansion:
> 
> Another was in the following code from
> /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr43784.c:
> 
> static struct s __attribute__((noinline)) rp(void)
> {
>   return *p;
> }
> 
> where p is a global that is assigned as:
> 
> static union u v;
> static struct s *p = &v.d.b;
> 
> RTL expansion synthesizes a bunch of tests and branches (alignment
> handling?), and guesses at the probabilities of the resulting

Yep, this should get better with my memcpy patch.  I will try to update it and
get it to mainline druing the weekend.

> branches. Unfortunately, this is less than accurate and we end up
> executing a block that is given a small likelihood, making it below
> the threshold with the default unlikely fraction of 4.

> 
> Here is the patch with the current threshold kept as 4. Let me know if
> you think we can kick this up to 20, or if we should just increase the
> threshold for splitting.
> 
> Passes regression tests, ok for trunk?
> 
> (BTW, getting close. I have a bunch of fixes to the loop unroller that
> need some cleanup, and a small fix to ssa tail merging to update the
> profiles, but I think that does it for the tests I was looking at.
> Will send those soon for review.)
> 
> Thanks,
> Teresa
> 
> 2013-10-10  Teresa Johnson  <tejohn...@google.com>
> 
>         * predict.c (probably_never_executed): Compare frequency-based
>         count to number of training runs.
>         * params.def (UNLIKELY_BB_COUNT_FRACTION): New parameter.

OK, thanks for working on this!
> 
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 203390)
> +++ predict.c   (working copy)
> @@ -237,17 +237,33 @@ probably_never_executed (struct function *fun,
>    gcc_checking_assert (fun);
>    if (profile_status_for_function (fun) == PROFILE_READ)
>      {
> -      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
> +      int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +      if (count * unlikely_count_fraction >= profile_info->runs)
>         return false;
>        if (!frequency)
>         return true;
>        if (!ENTRY_BLOCK_PTR->frequency)
>         return false;
> -      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
> REG_BR_PROB_BASE)
> +      if (ENTRY_BLOCK_PTR->count)
>         {
> -         return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
> -                       ENTRY_BLOCK_PTR->frequency)
> -                 < REG_BR_PROB_BASE / 4);
> +          gcov_type computed_count;
> +          /* Check for possibility of overflow, in which case entry bb count
> +             is large enough to do the division first without losing much
> +             precision.  */
> +          if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
> +            {
> +              gcov_type scaled_count
> +                  = frequency * ENTRY_BLOCK_PTR->count *
> unlikely_count_fraction;
> +              computed_count = RDIV (scaled_count, 
> ENTRY_BLOCK_PTR->frequency);
> +            }
> +          else
> +            {
> +              computed_count = RDIV (ENTRY_BLOCK_PTR->count,
> +                                     ENTRY_BLOCK_PTR->frequency);
> +              computed_count *= frequency * unlikely_count_fraction;
> +            }
> +          if (computed_count >= profile_info->runs)
> +            return false;
>         }
>        return true;
>      }
> Index: params.def
> ===================================================================
> --- params.def  (revision 203390)
> +++ params.def  (working copy)
> @@ -373,6 +373,11 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
>          "Select fraction of the maximal frequency of executions of
> basic block in function given basic block needs to have to be
> considered hot",
>          1000, 0, 0)
> 
> +DEFPARAM(UNLIKELY_BB_COUNT_FRACTION,
> +        "unlikely-bb-count-fraction",
> +         "The minimum fraction of profile runs a given basic block
> execution count must be not to be considered unlikely",
> +        4, 1, 10000)
> +
>  DEFPARAM (PARAM_ALIGN_THRESHOLD,
>           "align-threshold",
>           "Select fraction of the maximal frequency of executions of
> basic block in function given basic block get alignment",
> 
> 
> >
> > Thanks,
> > Teresa
> >
> >>
> >> I will check the second part of patch separately.
> >> Thanks!
> >> Honza
> >
> >
> >
> > --
> > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [Patch] Handle profile counts truncated to 0 in coldness test

Reply via email to