On 11/23/19 11:26 PM, Bin.Cheng wrote: > On Fri, Nov 22, 2019 at 3:23 PM Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Fri, Nov 22, 2019 at 3:19 PM Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He >>> <heli...@linux.ibm.com> wrote: >>>> >>>> On 2019/11/21 8:10 PM, Richard Biener wrote: >>>>> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He <heli...@linux.ibm.com> >>>> wrote: >>>>>> Hi, >>>>>> >>>>>> I found for the follow code: >>>>>> >>>>>> #define N 256 >>>>>> int a[N][N][N], b[N][N][N]; >>>>>> int d[N][N], c[N][N]; >>>>>> void __attribute__((noinline)) >>>>>> double_reduc (int n) >>>>>> { >>>>>> for (int k = 0; k < n; k++) >>>>>> { >>>>>> for (int l = 0; l < n; l++) >>>>>> { >>>>>> c[k][l] = 0; >>>>>> for (int m = 0; m < n; m++) >>>>>> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> I dumped the file after loop interchange and got the following >>>> information: >>>>>> <bb 3> [local count: 118111600]: >>>>>> # m_46 = PHI <0(7), m_45(11)> >>>>>> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> >>>>>> _39 = _49 + 1; >>>>>> >>>>>> <bb 4> [local count: 955630224]: >>>>>> # l_48 = PHI <0(3), l_47(12)> >>>>>> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> >>>>>> c_I_I_lsm.5_18 = c[k_28][l_48]; >>>>>> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; >>>>>> _2 = a[k_28][m_46][l_48]; >>>>>> _3 = d[k_28][m_46]; >>>>>> _4 = _2 * _3; >>>>>> _5 = b[k_28][m_46][l_48]; >>>>>> _6 = _3 * _5; >>>>>> _7 = _4 + _6; >>>>>> _8 = _7 + c_I_I_lsm.5_53; >>>>>> c[k_28][l_48] = _8; >>>>>> l_47 = l_48 + 1; >>>>>> ivtmp_40 = ivtmp_41 - 1; >>>>>> if (ivtmp_40 != 0) >>>>>> goto <bb 12>; [89.00%] >>>>>> else >>>>>> goto <bb 5>; [11.00%] >>>>>> >>>>>> we can see '_3 = d[k_28][m_46];' is a loop invariant. >>>>>> Do we need to add a loop invariant motion pass after the loop >>>> interchange? >>>>> There is one at the end of the loop pipeline. >>>> Hi, >>>> >>>> The one at the end of the loop pipeline may miss some optimization >>>> opportunities. If we vectorize the above code (a.c.158t.vect), we >>>> can get information similar to the following: >>>> >>>> bb 3: >>>> # m_46 = PHI <0(7), m_45(11)> // loop m, outer loop >>>> if (_59 <= 2) >>>> goto bb 20; >>>> else >>>> goto bb 15; >>>> >>>> bb 15: >>>> _89 = d[k_28][m_46]; >>>> vect_cst__90 = {_89, _89, _89, _89}; >>>> >>>> bb 4: >>>> # l_48 = PHI <l_47(12), 0(15)> // loop l, inner loop >>>> vect__6.23_100 = vect_cst__99 * vect__5.22_98; >>>> if (ivtmp_110 < bnd.8_1) >>>> goto bb 12; >>>> else >>>> goto bb 17; >>>> >>>> bb 20: >>>> bb 18: >>>> _27 = d[k_28][m_46]; >>>> if (ivtmp_12 != 0) >>>> goto bb 19; >>>> else >>>> goto bb 21; >>>> >>>> Vectorization will do some conversions in this case. We can see >>>> ‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant >>>> relative to loop l. We can move ‘d[k_28][m_46]’ to the front of >>>> ‘if (_59 <= 2)’ to get rid of loading data from memory in both >>>> branches. >>>> >>>> The one at at the end of the loop pipeline can't handle this situation. >>>> If we move d[k_28][m_46] from loop l to loop m before doing >>>> vectorization, we can get rid of this situation. >>> But we can't run every pass after every other. With multiple passes having >>> ordering issues is inevitable. >>> >>> Now - interchange could trigger a region based invariant motion just for >>> the nest it interchanged. But that doesn't exist right now. >> With data reference/dependence information in the pass, I think it >> could be quite straightforward. Didn't realize that we need it >> before. > FYI, attachment is a simple fix in loop interchange for the reported > issue. It's untested, neither for GCC10. > > Thanks, > bin >> Thanks, >> bin >>> Richard. >>> >>> linterchange-invariant-dataref-motion.patch >>> So it looks like Martin and Richi are working on this right now. I'm going to drop this from my queue.
jeff