Hello Richard: On 13/07/24 8:16 pm, Ajit Agarwal wrote: > Hello Richard: > > On 12/07/24 6:20 pm, Richard Biener wrote: >> On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal <aagar...@linux.ibm.com> wrote: >>> >>> Hello Richard: >>> >>> On 11/07/24 2:21 pm, Richard Biener wrote: >>>> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal <aagar...@linux.ibm.com> >>>> wrote: >>>>> >>>>> Hello All: >>>>> >>>>> Unroll factor is determined with max distance across loop iterations. >>>>> The logic for determining the loop unroll factor is based on >>>>> data dependency across loop iterations. >>>>> >>>>> The max distance across loop iterations is the unrolling factor >>>>> that helps in predictive commoning. >>>> >>>> The old comment in the code says >>>> >>>>> - /* The best unroll factor for this chain is equal to the number of >>>>> - temporary variables that we create for it. */ >>>> >>>> why is that wrong and why is the max dependence distance more correct? >>>> >>>> Do you have a testcase that shows how this makes a (positive) difference? >>>> >>> >>> There is nothing wrong in the existing implementation of unroll >>> factor for predictive commoning. >>> >>> But with max dependence distance we get performance improvement >>> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without >>> changes. Improvement in benchmarks with max dependence distance >>> changes. >>> >>> I have used the following flags: >>> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning >>> -fno-tree-pre >>> >>> With above flags I ran with and without changes. >> >> A 0.01% geomean improvement is noise. Why did you disable PRE? >> > > I have changed the flags. Now I changed the flags to -O3 > -fpredictive-commoning -funroll-loops. With these flags > I am measuring the performance with spec 2017 benchmarks. > Would let you know the results by Monday. >
With the changes in this patch, 500.perlbench_r gave gain of 2.56% (spec 2017 INT) benchmarks. There is no degradation with other benchmarks. Thanks & Regards Ajit >>> There is no degradation with spec 2017 (FP benchmarks). >>> >>> Because in predictive commoning we reuse values computed in >>> earlier iterations of a loop in the later ones, max distance is the >>> better choice. >> >> The re-use distance is the same though. So your change merely increases >> the unroll factor? Or can you explain why there is more re-use with >> your change. >> > > With -O3 -fpredictive-commoning -funroll-loops many spec 2017 benchmarks > increases unroll factor with my changes. > > I am traversing data dependence relation vector, get the distance > and max distance is the unroll factor. > >> Richard. >> > Thanks & Regards > Ajit >>>> Richard. >>>> >>> >>> Thanks & Regards >>> Ajit >>> >>>>> Bootstrapped and regtested on powerpc64-linux-gnu. >>>>> >>>>> Thanks & Regards >>>>> Ajit >>>>> >>>>> tree-optimization, predcom: Improve unroll factor for predictive commoning >>>>> >>>>> Unroll factor is determined with max distance across loop iterations. >>>>> The logic for determining the loop unroll factor is based on >>>>> data dependency across loop iterations. >>>>> >>>>> The max distance across loop iterations is the unrolling factor >>>>> that helps in predictive commoning. >>>>> >>>>> 2024-07-11 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * tree-predcom.cc: Change in determining unroll factor with >>>>> data dependence across loop iterations. >>>>> --- >>>>> gcc/tree-predcom.cc | 51 ++++++++++++++++++++++++++++++++++----------- >>>>> 1 file changed, 39 insertions(+), 12 deletions(-) >>>>> >>>>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc >>>>> index 9844fee1e97..029b02f5990 100644 >>>>> --- a/gcc/tree-predcom.cc >>>>> +++ b/gcc/tree-predcom.cc >>>>> @@ -409,6 +409,7 @@ public: >>>>> /* Perform the predictive commoning optimization for chains, make this >>>>> public for being called in callback execute_pred_commoning_cbck. */ >>>>> void execute_pred_commoning (bitmap tmp_vars); >>>>> + unsigned determine_unroll_factor (const vec<chain_p> &chains); >>>>> >>>>> private: >>>>> /* The pointer to the given loop. */ >>>>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain >>>>> (chain_p chain, >>>>> copies as possible. CHAINS is the list of chains that will be >>>>> optimized. */ >>>>> >>>>> -static unsigned >>>>> -determine_unroll_factor (const vec<chain_p> &chains) >>>>> +unsigned >>>>> +pcom_worker::determine_unroll_factor (const vec<chain_p> &chains) >>>>> { >>>>> chain_p chain; >>>>> - unsigned factor = 1, af, nfactor, i; >>>>> + unsigned factor = 1, i; >>>>> unsigned max = param_max_unroll_times; >>>>> + struct data_dependence_relation *ddr; >>>>> + unsigned nfactor = 0; >>>>> + int nzfactor = 0; >>>>> + >>>>> + /* Best unroll factor is the maximum distance across loop >>>>> + iterations. */ >>>>> + FOR_EACH_VEC_ELT (m_dependences, i, ddr) >>>>> + { >>>>> + for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++) >>>>> + { >>>>> + lambda_vector vec = DDR_DIST_VECT (ddr, j); >>>>> + widest_int distance = vec[j]; >>>>> + unsigned offset = distance.to_uhwi (); >>>>> + if (offset == 0) >>>>> + continue; >>>>> + >>>>> + int dist = offset - nzfactor; >>>>> + if (dist == 0) >>>>> + continue; >>>>> >>>>> + if (nfactor == 0) >>>>> + { >>>>> + nfactor = offset; >>>>> + nzfactor = offset; >>>>> + } >>>>> + else if (dist <= nzfactor) >>>>> + nfactor = offset; >>>>> + >>>>> + if (nfactor > 0 && nfactor <= max) >>>>> + factor = nfactor; >>>>> + } >>>>> + } >>>>> + >>>>> + int max_use = 0; >>>>> FOR_EACH_VEC_ELT (chains, i, chain) >>>>> { >>>>> if (chain->type == CT_INVARIANT) >>>>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec<chain_p> >>>>> &chains) >>>>> continue; >>>>> } >>>>> >>>>> - /* The best unroll factor for this chain is equal to the number of >>>>> - temporary variables that we create for it. */ >>>>> - af = chain->length; >>>>> if (chain->has_max_use_after) >>>>> - af++; >>>>> - >>>>> - nfactor = factor * af / gcd (factor, af); >>>>> - if (nfactor <= max) >>>>> - factor = nfactor; >>>>> + max_use++; >>>>> } >>>>> - >>>>> + factor += max_use; >>>>> return factor; >>>>> } >>>>> >>>>> -- >>>>> 2.43.5 >>>>>