On 11/23/19 11:26 PM, Bin.Cheng wrote:
> On Fri, Nov 22, 2019 at 3:23 PM Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Fri, Nov 22, 2019 at 3:19 PM Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He 
>>> <heli...@linux.ibm.com> wrote:
>>>>
>>>> On 2019/11/21 8:10 PM, Richard Biener wrote:
>>>>> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He <heli...@linux.ibm.com>
>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I found for the follow code:
>>>>>>
>>>>>> #define N 256
>>>>>> int a[N][N][N], b[N][N][N];
>>>>>> int d[N][N], c[N][N];
>>>>>> void __attribute__((noinline))
>>>>>> double_reduc (int n)
>>>>>> {
>>>>>>     for (int k = 0; k < n; k++)
>>>>>>     {
>>>>>>       for (int l = 0; l < n; l++)
>>>>>>        {
>>>>>>          c[k][l] = 0;
>>>>>>           for (int m = 0; m < n; m++)
>>>>>>             c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
>>>>>>        }
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> I dumped the file after loop interchange and got the following
>>>> information:
>>>>>> <bb 3> [local count: 118111600]:
>>>>>>     # m_46 = PHI <0(7), m_45(11)>
>>>>>>     # ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
>>>>>>     _39 = _49 + 1;
>>>>>>
>>>>>>     <bb 4> [local count: 955630224]:
>>>>>>     # l_48 = PHI <0(3), l_47(12)>
>>>>>>     # ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
>>>>>>     c_I_I_lsm.5_18 = c[k_28][l_48];
>>>>>>     c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
>>>>>>     _2 = a[k_28][m_46][l_48];
>>>>>>     _3 = d[k_28][m_46];
>>>>>>     _4 = _2 * _3;
>>>>>>     _5 = b[k_28][m_46][l_48];
>>>>>>     _6 = _3 * _5;
>>>>>>     _7 = _4 + _6;
>>>>>>     _8 = _7 + c_I_I_lsm.5_53;
>>>>>>     c[k_28][l_48] = _8;
>>>>>>     l_47 = l_48 + 1;
>>>>>>     ivtmp_40 = ivtmp_41 - 1;
>>>>>>     if (ivtmp_40 != 0)
>>>>>>       goto <bb 12>; [89.00%]
>>>>>>     else
>>>>>>       goto <bb 5>; [11.00%]
>>>>>>
>>>>>> we can see '_3 = d[k_28][m_46];'  is a loop invariant.
>>>>>> Do we need to add a loop invariant motion pass after the loop
>>>> interchange?
>>>>> There is one at the end of the loop pipeline.
>>>> Hi,
>>>>
>>>> The one at the end of the loop pipeline may miss some optimization
>>>> opportunities.  If we vectorize the above code (a.c.158t.vect), we
>>>> can get information similar to the following:
>>>>
>>>> bb 3:
>>>>  # m_46 = PHI <0(7), m_45(11)>  // loop m, outer loop
>>>>   if (_59 <= 2)
>>>>     goto bb 20;
>>>>   else
>>>>     goto bb 15;
>>>>
>>>> bb 15:
>>>>   _89 = d[k_28][m_46];
>>>>   vect_cst__90 = {_89, _89, _89, _89};
>>>>
>>>> bb 4:
>>>>    # l_48 = PHI <l_47(12), 0(15)> // loop l, inner loop
>>>>   vect__6.23_100 = vect_cst__99 * vect__5.22_98;
>>>>    if (ivtmp_110 < bnd.8_1)
>>>>     goto bb 12;
>>>>   else
>>>>     goto bb 17;
>>>>
>>>> bb 20:
>>>> bb 18:
>>>>    _27 = d[k_28][m_46];
>>>> if (ivtmp_12 != 0)
>>>>     goto bb 19;
>>>>   else
>>>>     goto bb 21;
>>>>
>>>> Vectorization will do some conversions in this case.  We can see
>>>> ‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant
>>>> relative to loop l.  We can move ‘d[k_28][m_46]’ to the front of
>>>> ‘if (_59 <= 2)’ to get rid of loading data from memory in both
>>>> branches.
>>>>
>>>> The one at at the end of the loop pipeline can't handle this situation.
>>>> If we move d[k_28][m_46] from loop l to loop m before doing
>>>> vectorization, we can get rid of this situation.
>>> But we can't run every pass after every other. With multiple passes having 
>>> ordering issues is inevitable.
>>>
>>> Now - interchange could trigger a region based invariant motion just for 
>>> the nest it interchanged. But that doesn't exist right now.
>> With data reference/dependence information in the pass, I think it
>> could be quite straightforward.  Didn't realize that we need it
>> before.
> FYI, attachment is a simple fix in loop interchange for the reported
> issue. It's untested, neither for GCC10.
>
> Thanks,
> bin
>> Thanks,
>> bin
>>> Richard.
>>>
>>> linterchange-invariant-dataref-motion.patch
>>>
So it looks like Martin and Richi are working on this right now.  I'm
going to drop this from my queue.


jeff


Reply via email to