On Fri, Nov 22, 2019 at 3:19 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He <heli...@linux.ibm.com> > wrote: > > > > > >On 2019/11/21 8:10 PM, Richard Biener wrote: > >> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He <heli...@linux.ibm.com> > >wrote: > >>> > >>> Hi, > >>> > >>> I found for the follow code: > >>> > >>> #define N 256 > >>> int a[N][N][N], b[N][N][N]; > >>> int d[N][N], c[N][N]; > >>> void __attribute__((noinline)) > >>> double_reduc (int n) > >>> { > >>> for (int k = 0; k < n; k++) > >>> { > >>> for (int l = 0; l < n; l++) > >>> { > >>> c[k][l] = 0; > >>> for (int m = 0; m < n; m++) > >>> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; > >>> } > >>> } > >>> } > >>> > >>> I dumped the file after loop interchange and got the following > >information: > >>> > >>> <bb 3> [local count: 118111600]: > >>> # m_46 = PHI <0(7), m_45(11)> > >>> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> > >>> _39 = _49 + 1; > >>> > >>> <bb 4> [local count: 955630224]: > >>> # l_48 = PHI <0(3), l_47(12)> > >>> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> > >>> c_I_I_lsm.5_18 = c[k_28][l_48]; > >>> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; > >>> _2 = a[k_28][m_46][l_48]; > >>> _3 = d[k_28][m_46]; > >>> _4 = _2 * _3; > >>> _5 = b[k_28][m_46][l_48]; > >>> _6 = _3 * _5; > >>> _7 = _4 + _6; > >>> _8 = _7 + c_I_I_lsm.5_53; > >>> c[k_28][l_48] = _8; > >>> l_47 = l_48 + 1; > >>> ivtmp_40 = ivtmp_41 - 1; > >>> if (ivtmp_40 != 0) > >>> goto <bb 12>; [89.00%] > >>> else > >>> goto <bb 5>; [11.00%] > >>> > >>> we can see '_3 = d[k_28][m_46];' is a loop invariant. > >>> Do we need to add a loop invariant motion pass after the loop > >interchange? > >> > >> There is one at the end of the loop pipeline. > > > >Hi, > > > >The one at the end of the loop pipeline may miss some optimization > >opportunities. If we vectorize the above code (a.c.158t.vect), we > >can get information similar to the following: > > > >bb 3: > > # m_46 = PHI <0(7), m_45(11)> // loop m, outer loop > > if (_59 <= 2) > > goto bb 20; > > else > > goto bb 15; > > > >bb 15: > > _89 = d[k_28][m_46]; > > vect_cst__90 = {_89, _89, _89, _89}; > > > >bb 4: > > # l_48 = PHI <l_47(12), 0(15)> // loop l, inner loop > > vect__6.23_100 = vect_cst__99 * vect__5.22_98; > > if (ivtmp_110 < bnd.8_1) > > goto bb 12; > > else > > goto bb 17; > > > >bb 20: > >bb 18: > > _27 = d[k_28][m_46]; > >if (ivtmp_12 != 0) > > goto bb 19; > > else > > goto bb 21; > > > >Vectorization will do some conversions in this case. We can see > >‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant > >relative to loop l. We can move ‘d[k_28][m_46]’ to the front of > >‘if (_59 <= 2)’ to get rid of loading data from memory in both > >branches. > > > >The one at at the end of the loop pipeline can't handle this situation. > >If we move d[k_28][m_46] from loop l to loop m before doing > >vectorization, we can get rid of this situation. > > But we can't run every pass after every other. With multiple passes having > ordering issues is inevitable. > > Now - interchange could trigger a region based invariant motion just for the > nest it interchanged. But that doesn't exist right now. With data reference/dependence information in the pass, I think it could be quite straightforward. Didn't realize that we need it before.
Thanks, bin > > Richard.