On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>> Hi,
>>> As explained in the PR, given below test case:
>>> int a[8][10] = { [2][5] = 4 }, c;
>>>
>>> int
>>> main ()
>>> {
>>>   short b;
>>>   int i, d;
>>>   for (b = 4; b >= 0; b--)
>>>     for (c = 0; c <= 6; c++)
>>>       a[c + 1][b + 2] = a[c][b + 1];
>>>   for (i = 0; i < 8; i++)
>>>     for (d = 0; d < 10; d++)
>>>       if (a[i][d] != (i == 3 && d == 6) * 4)
>>>         __builtin_abort ();
>>>   return 0;
>>>
>>> the loop nest is illegal for vectorization without reversing inner loop.  
>>> The issue
>>> is in data dependence checking of vectorizer, I believe the mentioned 
>>> revision just
>>> exposed this.  Previously the vectorization is skipped because of 
>>> unsupported memory
>>> operation.  The outer loop vectorization unrolls the outer loop into:
>>>
>>>   for (b = 4; b > 0; b -= 4)
>>>   {
>>>     for (c = 0; c <= 6; c++)
>>>       a[c + 1][6] = a[c][5];
>>>     for (c = 0; c <= 6; c++)
>>>       a[c + 1][5] = a[c][4];
>>>     for (c = 0; c <= 6; c++)
>>>       a[c + 1][4] = a[c][3];
>>>     for (c = 0; c <= 6; c++)
>>>       a[c + 1][3] = a[c][2];
>>>   }
>>> Then four inner loops are fused into:
>>>   for (b = 4; b > 0; b -= 4)
>>>   {
>>>     for (c = 0; c <= 6; c++)
>>>     {
>>>       a[c + 1][6] = a[c][5];  // S1
>>>       a[c + 1][5] = a[c][4];  // S2
>>>       a[c + 1][4] = a[c][3];
>>>       a[c + 1][3] = a[c][2];
>>>     }
>>>   }
>>
>> Note that they are not really "fused" but they are interleaved.  With
>> GIMPLE in mind
>> that makes a difference, you should get the equivalent of
>>
>>    for (c = 0; c <= 6; c++)
>>      {
>>        tem1 = a[c][5];
>>        tem2 = a[c][4];
>>        tem3 = a[c][3];
>>        tem4 = a[c][2];
>>        a[c+1][6] = tem1;
>>        a[c +1][5] = tem2;
>>         a[c+1][4] = tem3;
>>         a[c+1][3] = tem4;
>>      }
> Yeah, I will double check if this abstract breaks the patch and how.
Hmm, I think this doesn't break it, well at least for part of the
analysis, because it is loop carried (backward) dependence goes wrong,
interleaving or not with the same iteration doesn't matter here.

Thanks,
bin
>
>>
>>> The loop fusion needs to meet the dependence requirement.  Basically, GCC's 
>>> data
>>> dependence analyzer does not model dep between references in sibling loops, 
>>> but
>>> in practice, fusion requirement can be checked by analyzing all data 
>>> references
>>> after fusion, and there is no backward data dependence.
>>>
>>> Apparently, the requirement is violated because we have backward data 
>>> dependence
>>> between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the 
>>> inner
>>> loop, the outer loop would become legal for vectorization.
>>>
>>> This patch fixes the issue by enforcing dependence check.  It also adds two 
>>> tests
>>> with one shouldn't be vectorized and the other should.  Bootstrap and test 
>>> on x86_64
>>> and AArch64.  Is it OK?
>>
>> I think you have identified the spot where things go wrong but I'm not
>> sure you fix the
>> problem fully.  The spot you pacth is (loop is the outer loop):
>>
>>   loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
>> ...
>>   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>>     {
>>       int dist = dist_v[loop_depth];
>> ...
>>       if (dist > 0 && DDR_REVERSED_P (ddr))
>>         {
>>           /* If DDR_REVERSED_P the order of the data-refs in DDR was
>>              reversed (to make distance vector positive), and the actual
>>              distance is negative.  */
>>           if (dump_enabled_p ())
>>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                              "dependence distance negative.\n");
>>
>> where you add
>>
>> +         /* When doing outer loop vectorization, we need to check if there 
>> is
>> +            backward dependence at inner loop level if dependence at the 
>> outer
>> +            loop is reversed.  See PR81740 for more information.  */
>> +         if (nested_in_vect_loop_p (loop, DR_STMT (dra))
>> +             || nested_in_vect_loop_p (loop, DR_STMT (drb)))
>> +           {
>> +             unsigned inner_depth = index_in_loop_nest (loop->inner->num,
>> +                                                        DDR_LOOP_NEST 
>> (ddr));
>> +             if (dist_v[inner_depth] < 0)
>> +               return true;
>> +           }
>>
>> but I don't understand how the dependence direction with respect to the
>> outer loop matters here.
> If the direction wrto outer loop is positive by itself, i.e,
> reversed_p equals to false, then dist is checked against max_vf.  In
> this case, it's not possible to have references refer to the same
> object?
> On the other hand, dist is not checked at all for reversed case.
> Maybe an additional check "dist < max_vf" can relax the patch a bit.
>>
>> Given there's DDR_REVERSED on the outer loop distance what does that
>> mean for the inner loop distance given the quite non-obvious code handling
>> this case in tree-data-ref.c:
>>
>>       /* Verify a basic constraint: classic distance vectors should
>>          always be lexicographically positive.
>>
>>          Data references are collected in the order of execution of
>>          the program, thus for the following loop
>>
>>          | for (i = 1; i < 100; i++)
>>          |   for (j = 1; j < 100; j++)
>>          |     {
>>          |       t = T[j+1][i-1];  // A
>>          |       T[j][i] = t + 2;  // B
>>          |     }
>>
>>          references are collected following the direction of the wind:
>>          A then B.  The data dependence tests are performed also
>>          following this order, such that we're looking at the distance
>>          separating the elements accessed by A from the elements later
>>          accessed by B.  But in this example, the distance returned by
>>          test_dep (A, B) is lexicographically negative (-1, 1), that
>>          means that the access A occurs later than B with respect to
>>          the outer loop, ie. we're actually looking upwind.  In this
>>          case we solve test_dep (B, A) looking downwind to the
>>          lexicographically positive solution, that returns the
>>          distance vector (1, -1).  */
>>       if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
>>         {
>>           lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
>>           if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
>>             return false;
>>           compute_subscript_distance (ddr);
>>           if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
>>                                             &index_carry))
>>             return false;
>>           save_dist_v (ddr, save_v);
>>           DDR_REVERSED_P (ddr) = true;
>>
>> that is, what does dist_v[inner_depth] < 0 tell us here?  Does it tell us
>> that the dependence direction is reverse of the outer loop?
> Hmm, this part of code is a bit confusing for me.  So we always
> compute classic distance by sorting two data references in lexico
> order, which could be the opposite given we collect references by dom
> order.  IIUC, the negative dist at inner loop means a backward
> dependence for that index/loop.  It's not related to outer loop or the
> reversed_p flag.
>
> Thanks,
> bin
>>
>> CCing Micha who might remember some more details from unroll-and-jam.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-12-15  Bin Cheng  <bin.ch...@arm.com>
>>>
>>>         PR tree-optimization/81740
>>>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case
>>>         of outer loop vectorization, check backward dependence at inner loop
>>>         if dependence at outer loop is reversed.
>>>
>>> gcc/testsuite
>>> 2017-12-15  Bin Cheng  <bin.ch...@arm.com>
>>>
>>>         PR tree-optimization/81740
>>>         * gcc.dg/vect/pr81740-1.c: New test.
>>>         * gcc.dg/vect/pr81740-2.c: Refine test.

Reply via email to