Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

Richard Biener Fri, 15 Dec 2017 03:56:03 -0800

On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> As explained in the PR, given below test case:
> int a[8][10] = { [2][5] = 4 }, c;
>
> int
> main ()
> {
>   short b;
>   int i, d;
>   for (b = 4; b >= 0; b--)
>     for (c = 0; c <= 6; c++)
>       a[c + 1][b + 2] = a[c][b + 1];
>   for (i = 0; i < 8; i++)
>     for (d = 0; d < 10; d++)
>       if (a[i][d] != (i == 3 && d == 6) * 4)
>         __builtin_abort ();
>   return 0;
>
> the loop nest is illegal for vectorization without reversing inner loop.  The 
> issue
> is in data dependence checking of vectorizer, I believe the mentioned 
> revision just
> exposed this.  Previously the vectorization is skipped because of unsupported 
> memory
> operation.  The outer loop vectorization unrolls the outer loop into:
>
>   for (b = 4; b > 0; b -= 4)
>   {
>     for (c = 0; c <= 6; c++)
>       a[c + 1][6] = a[c][5];
>     for (c = 0; c <= 6; c++)
>       a[c + 1][5] = a[c][4];
>     for (c = 0; c <= 6; c++)
>       a[c + 1][4] = a[c][3];
>     for (c = 0; c <= 6; c++)
>       a[c + 1][3] = a[c][2];
>   }
> Then four inner loops are fused into:
>   for (b = 4; b > 0; b -= 4)
>   {
>     for (c = 0; c <= 6; c++)
>     {
>       a[c + 1][6] = a[c][5];  // S1
>       a[c + 1][5] = a[c][4];  // S2
>       a[c + 1][4] = a[c][3];
>       a[c + 1][3] = a[c][2];
>     }
>   }


Note that they are not really "fused" but they are interleaved.  With
GIMPLE in mind
that makes a difference, you should get the equivalent of

   for (c = 0; c <= 6; c++)
     {
       tem1 = a[c][5];
       tem2 = a[c][4];
       tem3 = a[c][3];
       tem4 = a[c][2];
       a[c+1][6] = tem1;
       a[c +1][5] = tem2;
        a[c+1][4] = tem3;
        a[c+1][3] = tem4;
     }

> The loop fusion needs to meet the dependence requirement.  Basically, GCC's 
> data
> dependence analyzer does not model dep between references in sibling loops, 
> but
> in practice, fusion requirement can be checked by analyzing all data 
> references
> after fusion, and there is no backward data dependence.
>
> Apparently, the requirement is violated because we have backward data 
> dependence
> between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the 
> inner
> loop, the outer loop would become legal for vectorization.
>
> This patch fixes the issue by enforcing dependence check.  It also adds two 
> tests
> with one shouldn't be vectorized and the other should.  Bootstrap and test on 
> x86_64
> and AArch64.  Is it OK?

I think you have identified the spot where things go wrong but I'm not
sure you fix the
problem fully.  The spot you pacth is (loop is the outer loop):

  loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
...
  FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
    {
      int dist = dist_v[loop_depth];
...
      if (dist > 0 && DDR_REVERSED_P (ddr))
        {
          /* If DDR_REVERSED_P the order of the data-refs in DDR was
             reversed (to make distance vector positive), and the actual
             distance is negative.  */
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "dependence distance negative.\n");

where you add

+         /* When doing outer loop vectorization, we need to check if there is
+            backward dependence at inner loop level if dependence at the outer
+            loop is reversed.  See PR81740 for more information.  */
+         if (nested_in_vect_loop_p (loop, DR_STMT (dra))
+             || nested_in_vect_loop_p (loop, DR_STMT (drb)))
+           {
+             unsigned inner_depth = index_in_loop_nest (loop->inner->num,
+                                                        DDR_LOOP_NEST (ddr));
+             if (dist_v[inner_depth] < 0)
+               return true;
+           }

but I don't understand how the dependence direction with respect to the
outer loop matters here.

Given there's DDR_REVERSED on the outer loop distance what does that
mean for the inner loop distance given the quite non-obvious code handling
this case in tree-data-ref.c:

      /* Verify a basic constraint: classic distance vectors should
         always be lexicographically positive.

         Data references are collected in the order of execution of
         the program, thus for the following loop

         | for (i = 1; i < 100; i++)
         |   for (j = 1; j < 100; j++)
         |     {
         |       t = T[j+1][i-1];  // A
         |       T[j][i] = t + 2;  // B
         |     }

         references are collected following the direction of the wind:
         A then B.  The data dependence tests are performed also
         following this order, such that we're looking at the distance
         separating the elements accessed by A from the elements later
         accessed by B.  But in this example, the distance returned by
         test_dep (A, B) is lexicographically negative (-1, 1), that
         means that the access A occurs later than B with respect to
         the outer loop, ie. we're actually looking upwind.  In this
         case we solve test_dep (B, A) looking downwind to the
         lexicographically positive solution, that returns the
         distance vector (1, -1).  */
      if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr)))
        {
          lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr));
          if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest))
            return false;
          compute_subscript_distance (ddr);
          if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b,
                                            &index_carry))
            return false;
          save_dist_v (ddr, save_v);
          DDR_REVERSED_P (ddr) = true;

that is, what does dist_v[inner_depth] < 0 tell us here?  Does it tell us
that the dependence direction is reverse of the outer loop?

CCing Micha who might remember some more details from unroll-and-jam.

Thanks,
Richard.

> Thanks,
> bin
> 2017-12-15  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/81740
>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case
>         of outer loop vectorization, check backward dependence at inner loop
>         if dependence at outer loop is reversed.
>
> gcc/testsuite
> 2017-12-15  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/81740
>         * gcc.dg/vect/pr81740-1.c: New test.
>         * gcc.dg/vect/pr81740-2.c: Refine test.

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

Reply via email to