Re: [Patch, tree-optimization, predcom] Improve unroll factor for predictive commoning

Ajit Agarwal Mon, 15 Jul 2024 05:37:43 -0700

Hello Richard:

On 13/07/24 8:16 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> On 12/07/24 6:20 pm, Richard Biener wrote:
>> On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagar...@linux.ibm.com> wrote:
>>>
>>> Hello Richard:
>>>
>>> On 11/07/24 2:21 pm, Richard Biener wrote:
>>>> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagar...@linux.ibm.com> 
>>>> wrote:
>>>>>
>>>>> Hello All:
>>>>>
>>>>> Unroll factor is determined with max distance across loop iterations.
>>>>> The logic for determining the loop unroll factor is based on
>>>>> data dependency across loop iterations.
>>>>>
>>>>> The max distance across loop iterations is the unrolling factor
>>>>> that helps in predictive commoning.
>>>>
>>>> The old comment in the code says
>>>>
>>>>> -      /* The best unroll factor for this chain is equal to the number of
>>>>> -        temporary variables that we create for it.  */
>>>>
>>>> why is that wrong and why is the max dependence distance more correct?
>>>>
>>>> Do you have a testcase that shows how this makes a (positive) difference?
>>>>
>>>
>>> There is nothing wrong in the existing implementation of unroll
>>> factor for predictive commoning.
>>>
>>> But with max dependence distance we get performance improvement
>>> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without
>>> changes. Improvement in benchmarks with max dependence distance
>>> changes.
>>>
>>> I have used the following flags:
>>> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning 
>>> -fno-tree-pre
>>>
>>> With above flags I ran with and without changes.
>>
>> A 0.01% geomean improvement is noise.  Why did you disable PRE?
>>
> 
> I have changed the flags. Now I changed the flags to -O3
> -fpredictive-commoning -funroll-loops. With these flags
> I am measuring the performance with spec 2017 benchmarks.
> Would let you know the results by Monday.
>


With the changes in this patch, 500.perlbench_r gave gain of
2.56% (spec 2017 INT) benchmarks. There is no degradation
with other benchmarks.

Thanks & Regards
Ajit
  
>>> There is no degradation with spec 2017 (FP benchmarks).
>>>
>>> Because in predictive commoning we reuse values computed in
>>> earlier iterations of a loop in the later ones, max distance is the
>>> better choice.
>>
>> The re-use distance is the same though.  So your change merely increases
>> the unroll factor?  Or can you explain why there is more re-use with
>> your change.
>>
> 
> With -O3 -fpredictive-commoning -funroll-loops many spec 2017 benchmarks
> increases unroll factor with my changes.
> 
> I am traversing data dependence relation vector, get the distance
> and max distance is the unroll factor.
>  
>> Richard.
>>
> Thanks & Regards
> Ajit
>>>> Richard.
>>>>
>>>
>>> Thanks & Regards
>>> Ajit
>>>
>>>>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>>>>
>>>>> Thanks & Regards
>>>>> Ajit
>>>>>
>>>>> tree-optimization, predcom: Improve unroll factor for predictive commoning
>>>>>
>>>>> Unroll factor is determined with max distance across loop iterations.
>>>>> The logic for determining the loop unroll factor is based on
>>>>> data dependency across loop iterations.
>>>>>
>>>>> The max distance across loop iterations is the unrolling factor
>>>>> that helps in predictive commoning.
>>>>>
>>>>> 2024-07-11  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>         * tree-predcom.cc: Change in determining unroll factor with
>>>>>         data dependence across loop iterations.
>>>>> ---
>>>>>  gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++-----------
>>>>>  1 file changed, 39 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
>>>>> index 9844fee1e97..029b02f5990 100644
>>>>> --- a/gcc/tree-predcom.cc
>>>>> +++ b/gcc/tree-predcom.cc
>>>>> @@ -409,6 +409,7 @@ public:
>>>>>    /* Perform the predictive commoning optimization for chains, make this
>>>>>       public for being called in callback execute_pred_commoning_cbck.  */
>>>>>    void execute_pred_commoning (bitmap tmp_vars);
>>>>> +  unsigned determine_unroll_factor (const vec<chain_p> &chains);
>>>>>
>>>>>  private:
>>>>>    /* The pointer to the given loop.  */
>>>>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain 
>>>>> (chain_p chain,
>>>>>     copies as possible.  CHAINS is the list of chains that will be
>>>>>     optimized.  */
>>>>>
>>>>> -static unsigned
>>>>> -determine_unroll_factor (const vec<chain_p> &chains)
>>>>> +unsigned
>>>>> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains)
>>>>>  {
>>>>>    chain_p chain;
>>>>> -  unsigned factor = 1, af, nfactor, i;
>>>>> +  unsigned factor = 1, i;
>>>>>    unsigned max = param_max_unroll_times;
>>>>> +  struct data_dependence_relation *ddr;
>>>>> +  unsigned nfactor = 0;
>>>>> +  int nzfactor = 0;
>>>>> +
>>>>> +  /* Best unroll factor is the maximum distance across loop
>>>>> +     iterations.  */
>>>>> +  FOR_EACH_VEC_ELT (m_dependences, i, ddr)
>>>>> +    {
>>>>> +      for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++)
>>>>> +       {
>>>>> +         lambda_vector vec = DDR_DIST_VECT (ddr, j);
>>>>> +         widest_int distance = vec[j];
>>>>> +         unsigned offset = distance.to_uhwi ();
>>>>> +         if (offset == 0)
>>>>> +           continue;
>>>>> +
>>>>> +         int dist = offset - nzfactor;
>>>>> +         if (dist  == 0)
>>>>> +           continue;
>>>>>
>>>>> +         if (nfactor == 0)
>>>>> +           {
>>>>> +             nfactor = offset;
>>>>> +             nzfactor = offset;
>>>>> +           }
>>>>> +         else if (dist <= nzfactor)
>>>>> +           nfactor = offset;
>>>>> +
>>>>> +         if (nfactor > 0 && nfactor <= max)
>>>>> +           factor = nfactor;
>>>>> +       }
>>>>> +    }
>>>>> +
>>>>> +  int max_use = 0;
>>>>>    FOR_EACH_VEC_ELT (chains, i, chain)
>>>>>      {
>>>>>        if (chain->type == CT_INVARIANT)
>>>>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> 
>>>>> &chains)
>>>>>           continue;
>>>>>         }
>>>>>
>>>>> -      /* The best unroll factor for this chain is equal to the number of
>>>>> -        temporary variables that we create for it.  */
>>>>> -      af = chain->length;
>>>>>        if (chain->has_max_use_after)
>>>>> -       af++;
>>>>> -
>>>>> -      nfactor = factor * af / gcd (factor, af);
>>>>> -      if (nfactor <= max)
>>>>> -       factor = nfactor;
>>>>> +       max_use++;
>>>>>      }
>>>>> -
>>>>> +  factor += max_use;
>>>>>    return factor;
>>>>>  }
>>>>>
>>>>> --
>>>>> 2.43.5
>>>>>

Re: [Patch, tree-optimization, predcom] Improve unroll factor for predictive commoning

Reply via email to