Re: [Patch] Handle profile counts truncated to 0 in coldness test

Teresa Johnson Mon, 07 Oct 2013 22:45:21 -0700

On Sun, Oct 6, 2013 at 6:43 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> 2013-10-03  Teresa Johnson  <tejohn...@google.com>
>>
>>         * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
>
> I would not mention HOT in the parameter name.  Blocks are hot/normal/unlikely
> and we HOT_BB_FREQUENCY_FRACTION.
>
> So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is relative
> to number of train runs?


Ok.

>> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
>> +        "min-hot-run-ratio",
>> +         "The minimum ratio of profile runs to basic block execution count "
>> +         "for the block to be considered hot",
> Perhaps as:
>> +         "The maximum ratio of profile runs to basic block execution count "
>> +         "for the block to be considered unlikely",
>
>> +        4, 0, 10000)
>
> lower bound should be 1 or we divide by 0.

Yep, will fix.

>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 203152)
>> +++ predict.c   (working copy)
>> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>>    gcc_checking_assert (fun);
>>    if (profile_status_for_function (fun) == PROFILE_READ)
>>      {
>> -      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
>> +      int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
>> +      if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
>
> This way if count is 1, runs needs to be n_run_ratio * 2
> So perhaps if (count * min_run_ratio >= profile_info->runs) instead of 
> division.

Ok.

>> -      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
>> REG_BR_PROB_BASE)
>> +      if (ENTRY_BLOCK_PTR->count)
>>         {
>> -         return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
>> -                       ENTRY_BLOCK_PTR->frequency)
>> -                 < REG_BR_PROB_BASE / 4);
>> +          gcov_type scaled_count
>> +              = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
>> +          gcov_type computed_count;
>> +          /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
>> +             be large enough to do the division first without losing much
>> +             precision.  */
>> +          if (scaled_count/frequency/min_run_ratio != 
>> ENTRY_BLOCK_PTR->count)
>
> I do not like the undefined behaviour triggered before the check (sure 
> unsigned
> arithmetic would fix that, but it seems all strange).  What about guarding the
> whole code.
>
> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
> For large counts the roudoff problems in first test should not be problem.

Sure, that sounds reasonable.

>> +            {
>> +              computed_count = RDIV (ENTRY_BLOCK_PTR->count,
>> +                                     ENTRY_BLOCK_PTR->frequency);
>> +              computed_count *= frequency * min_run_ratio;
>> +            }
>> +          else
>> +            computed_count = RDIV (scaled_count, 
>> ENTRY_BLOCK_PTR->frequency);
>> +          if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 0)
>
> So at nonoverflow path you compute
>
>    ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * 
> min_run_ratio / runs > 0.5

Yes, there is an extra min_run_ratio factor in there that I need to remove!
>
> I think you want
>
>    ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= runs

Right, will change that too.

>
> If entry_bb_frequency is 0, you can just return true iff frequency is > 
> min_run_ratio.

We already return false conservatively if entry_bb_frequency is 0. Do
we really want to change this - I'm not sure it makes sense to compare
the frequency to the run ratio directly.

>
> I think safe way to compute this is
>
> if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
>   scaled_count = ((frequency * entry_bb_count * min_run_ratio) / 
> entry_bb_frequency;
> else
>   scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * 
> min_run_ratio
>
> In first path, we have at most 10000*10000*10000*10000 that still fits in 
> 64bit
> value.
>
> In the second path the base of fixed point arithmetic is large enough so the
> division should not matter. We never get over entry_count * 10000 that is 
> safe.
> The first part of statement however just compute value that should match count
> so count > 10000 unless profile got misupated and the low-count roundoff 
> errors should
> not matter. So perhaps the whole code can execute only if
> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE
>     && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
> and we do not need any conditionals for scaled_count
>
> Does such code work with ratio changed to 16?  I think the double 
> multiplication in your
> patch may have caused problems.

Let me fix this up and will get back.

Thanks,
Teresa

>
> I will check the second part of patch separately.
> Thanks!
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [Patch] Handle profile counts truncated to 0 in coldness test

Reply via email to