Hello Richard: On 12/07/24 6:20 pm, Richard Biener wrote: > On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagar...@linux.ibm.com> wrote: >> >> Hello Richard: >> >> On 11/07/24 2:21 pm, Richard Biener wrote: >>> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagar...@linux.ibm.com> >>> wrote: >>>> >>>> Hello All: >>>> >>>> Unroll factor is determined with max distance across loop iterations. >>>> The logic for determining the loop unroll factor is based on >>>> data dependency across loop iterations. >>>> >>>> The max distance across loop iterations is the unrolling factor >>>> that helps in predictive commoning. >>> >>> The old comment in the code says >>> >>>> - /* The best unroll factor for this chain is equal to the number of >>>> - temporary variables that we create for it. */ >>> >>> why is that wrong and why is the max dependence distance more correct? >>> >>> Do you have a testcase that shows how this makes a (positive) difference? >>> >> >> There is nothing wrong in the existing implementation of unroll >> factor for predictive commoning. >> >> But with max dependence distance we get performance improvement >> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without >> changes. Improvement in benchmarks with max dependence distance >> changes. >> >> I have used the following flags: >> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning >> -fno-tree-pre >> >> With above flags I ran with and without changes. > > A 0.01% geomean improvement is noise. Why did you disable PRE? >
I have changed the flags used. Now I change the flags to -O3 and -fpredictive-commoning -funroll-loops. I am measuring the performance with the above flags for spec 2017 benchmarks and would let you know the performance by Monday. >> There is no degradation with spec 2017 (FP benchmarks). >> >> Because in predictive commoning we reuse values computed in >> earlier iterations of a loop in the later ones, max distance is the >> better choice. > > The re-use distance is the same though. So your change merely increases > the unroll factor? Or can you explain why there is more re-use with > your change. > With -O3 flag and -fpredictive-commoning -funroll-loops in spec 2017 benchmarks many benchmarks increases the unroll factor with my changes in predictive commoning determine_unroll_factor. I have traversed the data dependence relation vector and get the distance and the max data dependence distance is the unroll factor. > Richard. > Thanks & Regards Ajit >>> Richard. >>> >> >> Thanks & Regards >> Ajit >> >>>> Bootstrapped and regtested on powerpc64-linux-gnu. >>>> >>>> Thanks & Regards >>>> Ajit >>>> >>>> tree-optimization, predcom: Improve unroll factor for predictive commoning >>>> >>>> Unroll factor is determined with max distance across loop iterations. >>>> The logic for determining the loop unroll factor is based on >>>> data dependency across loop iterations. >>>> >>>> The max distance across loop iterations is the unrolling factor >>>> that helps in predictive commoning. >>>> >>>> 2024-07-11 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >>>> >>>> gcc/ChangeLog: >>>> >>>> * tree-predcom.cc: Change in determining unroll factor with >>>> data dependence across loop iterations. >>>> --- >>>> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- >>>> 1 file changed, 39 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc >>>> index 9844fee1e97..029b02f5990 100644 >>>> --- a/gcc/tree-predcom.cc >>>> +++ b/gcc/tree-predcom.cc >>>> @@ -409,6 +409,7 @@ public: >>>> /* Perform the predictive commoning optimization for chains, make this >>>> public for being called in callback execute_pred_commoning_cbck. */ >>>> void execute_pred_commoning (bitmap tmp_vars); >>>> + unsigned determine_unroll_factor (const vec<chain_p> &chains); >>>> >>>> private: >>>> /* The pointer to the given loop. */ >>>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p >>>> chain, >>>> copies as possible. CHAINS is the list of chains that will be >>>> optimized. */ >>>> >>>> -static unsigned >>>> -determine_unroll_factor (const vec<chain_p> &chains) >>>> +unsigned >>>> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) >>>> { >>>> chain_p chain; >>>> - unsigned factor = 1, af, nfactor, i; >>>> + unsigned factor = 1, i; >>>> unsigned max = param_max_unroll_times; >>>> + struct data_dependence_relation *ddr; >>>> + unsigned nfactor = 0; >>>> + int nzfactor = 0; >>>> + >>>> + /* Best unroll factor is the maximum distance across loop >>>> + iterations. */ >>>> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) >>>> + { >>>> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) >>>> + { >>>> + lambda_vector vec = DDR_DIST_VECT (ddr, j); >>>> + widest_int distance = vec[j]; >>>> + unsigned offset = distance.to_uhwi (); >>>> + if (offset == 0) >>>> + continue; >>>> + >>>> + int dist = offset - nzfactor; >>>> + if (dist == 0) >>>> + continue; >>>> >>>> + if (nfactor == 0) >>>> + { >>>> + nfactor = offset; >>>> + nzfactor = offset; >>>> + } >>>> + else if (dist <= nzfactor) >>>> + nfactor = offset; >>>> + >>>> + if (nfactor > 0 && nfactor <= max) >>>> + factor = nfactor; >>>> + } >>>> + } >>>> + >>>> + int max_use = 0; >>>> FOR_EACH_VEC_ELT (chains, i, chain) >>>> { >>>> if (chain->type == CT_INVARIANT) >>>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> >>>> &chains) >>>> continue; >>>> } >>>> >>>> - /* The best unroll factor for this chain is equal to the number of >>>> - temporary variables that we create for it. */ >>>> - af = chain->length; >>>> if (chain->has_max_use_after) >>>> - af++; >>>> - >>>> - nfactor = factor * af / gcd (factor, af); >>>> - if (nfactor <= max) >>>> - factor = nfactor; >>>> + max_use++; >>>> } >>>> - >>>> + factor += max_use; >>>> return factor; >>>> } >>>> >>>> -- >>>> 2.43.5 >>>>