Re: [Patch] Handle profile counts truncated to 0 in coldness test

Jan Hubicka Sun, 06 Oct 2013 06:44:30 -0700

> 2013-10-03  Teresa Johnson  <tejohn...@google.com>
> 
>         * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.


I would not mention HOT in the parameter name.  Blocks are hot/normal/unlikely
and we HOT_BB_FREQUENCY_FRACTION.

So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is relative
to number of train runs?
> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
> +        "min-hot-run-ratio",
> +         "The minimum ratio of profile runs to basic block execution count "
> +         "for the block to be considered hot",
Perhaps as:
> +         "The maximum ratio of profile runs to basic block execution count "
> +         "for the block to be considered unlikely",

> +        4, 0, 10000)

lower bound should be 1 or we divide by 0.
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 203152)
> +++ predict.c   (working copy)
> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>    gcc_checking_assert (fun);
>    if (profile_status_for_function (fun) == PROFILE_READ)
>      {
> -      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
> +      int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
> +      if (RDIV (count * min_run_ratio, profile_info->runs) > 0)

This way if count is 1, runs needs to be n_run_ratio * 2
So perhaps if (count * min_run_ratio >= profile_info->runs) instead of division.
> -      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
> REG_BR_PROB_BASE)
> +      if (ENTRY_BLOCK_PTR->count)
>         {
> -         return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
> -                       ENTRY_BLOCK_PTR->frequency)
> -                 < REG_BR_PROB_BASE / 4);
> +          gcov_type scaled_count
> +              = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
> +          gcov_type computed_count;
> +          /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
> +             be large enough to do the division first without losing much
> +             precision.  */
> +          if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR->count)

I do not like the undefined behaviour triggered before the check (sure unsigned
arithmetic would fix that, but it seems all strange).  What about guarding the
whole code.

if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
For large counts the roudoff problems in first test should not be problem.
> +            {
> +              computed_count = RDIV (ENTRY_BLOCK_PTR->count,
> +                                     ENTRY_BLOCK_PTR->frequency);
> +              computed_count *= frequency * min_run_ratio;
> +            }
> +          else
> +            computed_count = RDIV (scaled_count, ENTRY_BLOCK_PTR->frequency);
> +          if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 0)

So at nonoverflow path you compute

   ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * 
min_run_ratio / runs > 0.5

I think you want

   ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= runs

If entry_bb_frequency is 0, you can just return true iff frequency is > 
min_run_ratio.

I think safe way to compute this is

if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
  scaled_count = ((frequency * entry_bb_count * min_run_ratio) / 
entry_bb_frequency;
else
  scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * 
min_run_ratio

In first path, we have at most 10000*10000*10000*10000 that still fits in 64bit
value.

In the second path the base of fixed point arithmetic is large enough so the
division should not matter. We never get over entry_count * 10000 that is safe.
The first part of statement however just compute value that should match count
so count > 10000 unless profile got misupated and the low-count roundoff errors 
should
not matter. So perhaps the whole code can execute only if
if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE
    && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
and we do not need any conditionals for scaled_count

Does such code work with ratio changed to 16?  I think the double 
multiplication in your
patch may have caused problems.

I will check the second part of patch separately.
Thanks!
Honza

Re: [Patch] Handle profile counts truncated to 0 in coldness test

Reply via email to