On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > As explained in the PR, given below test case: > int a[8][10] = { [2][5] = 4 }, c; > > int > main () > { > short b; > int i, d; > for (b = 4; b >= 0; b--) > for (c = 0; c <= 6; c++) > a[c + 1][b + 2] = a[c][b + 1]; > for (i = 0; i < 8; i++) > for (d = 0; d < 10; d++) > if (a[i][d] != (i == 3 && d == 6) * 4) > __builtin_abort (); > return 0; > > the loop nest is illegal for vectorization without reversing inner loop. The > issue > is in data dependence checking of vectorizer, I believe the mentioned > revision just > exposed this. Previously the vectorization is skipped because of unsupported > memory > operation. The outer loop vectorization unrolls the outer loop into: > > for (b = 4; b > 0; b -= 4) > { > for (c = 0; c <= 6; c++) > a[c + 1][6] = a[c][5]; > for (c = 0; c <= 6; c++) > a[c + 1][5] = a[c][4]; > for (c = 0; c <= 6; c++) > a[c + 1][4] = a[c][3]; > for (c = 0; c <= 6; c++) > a[c + 1][3] = a[c][2]; > } > Then four inner loops are fused into: > for (b = 4; b > 0; b -= 4) > { > for (c = 0; c <= 6; c++) > { > a[c + 1][6] = a[c][5]; // S1 > a[c + 1][5] = a[c][4]; // S2 > a[c + 1][4] = a[c][3]; > a[c + 1][3] = a[c][2]; > } > }
Note that they are not really "fused" but they are interleaved. With GIMPLE in mind that makes a difference, you should get the equivalent of for (c = 0; c <= 6; c++) { tem1 = a[c][5]; tem2 = a[c][4]; tem3 = a[c][3]; tem4 = a[c][2]; a[c+1][6] = tem1; a[c +1][5] = tem2; a[c+1][4] = tem3; a[c+1][3] = tem4; } > The loop fusion needs to meet the dependence requirement. Basically, GCC's > data > dependence analyzer does not model dep between references in sibling loops, > but > in practice, fusion requirement can be checked by analyzing all data > references > after fusion, and there is no backward data dependence. > > Apparently, the requirement is violated because we have backward data > dependence > between references (a[c][5], a[c+1][5]) in S1/S2. Note, if we reverse the > inner > loop, the outer loop would become legal for vectorization. > > This patch fixes the issue by enforcing dependence check. It also adds two > tests > with one shouldn't be vectorized and the other should. Bootstrap and test on > x86_64 > and AArch64. Is it OK? I think you have identified the spot where things go wrong but I'm not sure you fix the problem fully. The spot you pacth is (loop is the outer loop): loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); ... FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) { int dist = dist_v[loop_depth]; ... if (dist > 0 && DDR_REVERSED_P (ddr)) { /* If DDR_REVERSED_P the order of the data-refs in DDR was reversed (to make distance vector positive), and the actual distance is negative. */ if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "dependence distance negative.\n"); where you add + /* When doing outer loop vectorization, we need to check if there is + backward dependence at inner loop level if dependence at the outer + loop is reversed. See PR81740 for more information. */ + if (nested_in_vect_loop_p (loop, DR_STMT (dra)) + || nested_in_vect_loop_p (loop, DR_STMT (drb))) + { + unsigned inner_depth = index_in_loop_nest (loop->inner->num, + DDR_LOOP_NEST (ddr)); + if (dist_v[inner_depth] < 0) + return true; + } but I don't understand how the dependence direction with respect to the outer loop matters here. Given there's DDR_REVERSED on the outer loop distance what does that mean for the inner loop distance given the quite non-obvious code handling this case in tree-data-ref.c: /* Verify a basic constraint: classic distance vectors should always be lexicographically positive. Data references are collected in the order of execution of the program, thus for the following loop | for (i = 1; i < 100; i++) | for (j = 1; j < 100; j++) | { | t = T[j+1][i-1]; // A | T[j][i] = t + 2; // B | } references are collected following the direction of the wind: A then B. The data dependence tests are performed also following this order, such that we're looking at the distance separating the elements accessed by A from the elements later accessed by B. But in this example, the distance returned by test_dep (A, B) is lexicographically negative (-1, 1), that means that the access A occurs later than B with respect to the outer loop, ie. we're actually looking upwind. In this case we solve test_dep (B, A) looking downwind to the lexicographically positive solution, that returns the distance vector (1, -1). */ if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) { lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, &index_carry)) return false; save_dist_v (ddr, save_v); DDR_REVERSED_P (ddr) = true; that is, what does dist_v[inner_depth] < 0 tell us here? Does it tell us that the dependence direction is reverse of the outer loop? CCing Micha who might remember some more details from unroll-and-jam. Thanks, Richard. > Thanks, > bin > 2017-12-15 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/81740 > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case > of outer loop vectorization, check backward dependence at inner loop > if dependence at outer loop is reversed. > > gcc/testsuite > 2017-12-15 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/81740 > * gcc.dg/vect/pr81740-1.c: New test. > * gcc.dg/vect/pr81740-2.c: Refine test.