On Tue, 12 Jan 2016, Tom de Vries wrote: > Hi, > > This patch fixes PR69110, a wrong-code bug in autopar. > > > I. > > consider testcase test.c: > ... > #define N 1000 > > unsigned int i = 0; > > static void __attribute__((noinline, noclone)) > foo (void) > { > unsigned int z; > > for (z = 0; z < N; ++z) > ++i; > } > > extern void abort (void); > > int > main (void) > { > foo (); > if (i != N) > abort (); > > return 0; > } > ... > > When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test > fails: > ... > $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd > -P)//install/lib64 -fno-tree-loop-im > $ ./a.out > Aborted (core dumped) > $ > ... > > > II. > > Before parloops, at ivcanon we have the loop body: > ... > <bb 3>: > # z_10 = PHI <z_7(4), 0(2)> > # ivtmp_12 = PHI <ivtmp_2(4), 1000(2)> > i.1_4 = i; > _5 = i.1_4 + 1; > i = _5; > z_7 = z_10 + 1; > ivtmp_2 = ivtmp_12 - 1; > if (ivtmp_2 != 0) > goto <bb 4>; > else > goto <bb 5>; > ... > > There's a loop-carried dependency in i, that is, the read from i in iteration > z == 1 depends on the write to i in iteration z == 0. So the loop cannot be > parallelized. The test-case fails because parloops still parallelizes the > loop. > > > III. > > Since the loop carried dependency is in-memory, it is not handled by the code > analyzing reductions, since that code ignores the virtual phi. > > So, AFAIU, this loop carried dependency should be handled by the dependency > testing in loop_parallel_p. And loop_parallel_p returns true for this loop. > > A comment in loop_parallel_p reads: "Check for problems with dependences. If > the loop can be reversed, the iterations are independent." > > AFAIU, the loop order can actually be reversed. But, it cannot be executed in > parallel. > > So from this perspective, it seems in this case the comment matches the check, > but the check is not sufficient. > > > IV. > > OTOH, if we replace the declaration of i with i[1], and replace the references > of i with i[0], we see that loop_parallel_p fails. So the loop_parallel_p > check in this case seems sufficient, and there's something else that causes > the check to fail in this case. > > The difference is in the generated data ref: > - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to > vector with a single element: access function 0. > - in the 'i' case, we set DR_ACCESS_FNS to NULL. > > This difference causes different handling in the dependency generation, in > particular in add_distance_for_zero_overlaps which has no effect for the 'i' > case because DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL > access_fns of both the source and sink data refs). > > From this perspective, it seems that the loop_parallel_p check is sufficient, > and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'. > > > V. > > When compiling with graphite using -floop-parallelize-all --param > graphite-min-loops-per-function=1, we find: > ... > [scop-detection-fail] Graphite cannot handle data-refs in stmt: > # VUSE <.MEM_11> > i.1_4 = i; > ... > > The function scop_detection::stmt_has_simple_data_refs_p returns false because > of the code recently added for PR66980 at r228357: > ... > int nb_subscripts = DR_NUM_DIMENSIONS (dr); > > if (nb_subscripts < 1) > { > free_data_refs (drs); > return false; > } > ... > > [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ] > > This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has > failed'. > > From this perspective, it seems that the dependence handling should bail out > once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS == > 0). > > > VI. > > This test-case used to pass in 4.6 because in find_data_references_in_stmt we > had: > ... > /* FIXME -- data dependence analysis does not work correctly for > objects with invariant addresses in loop nests. Let us fail > here until the problem is fixed. */ > if (dr_address_invariant_p (dr) && nest) > { > free_data_ref (dr); > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "\tFAILED as dr address is invariant\n"); > ret = false; > break; > } > ... > > That FIXME was removed in the fix for PR46787, at r175704. > > The test-case fails in 4.8, and I guess from there onwards. > > > VII. > > The attached patch fixes the problem by returning a zero access function for > 'i' in dr_analyze_indices. > > [ But I can also imagine a solution similar to the graphite fix: > ... > @@ -3997,6 +3999,12 @@ find_data_references_in_stmt > dr = create_data_ref (nest, loop_containing_stmt (stmt), > ref->ref, stmt, ref->is_read); > gcc_assert (dr != NULL); > + if (DR_NUM_DIMENSIONS (dr) == 0) > + { > + datarefs->release (); > + return false; > + } > + > datarefs->safe_push (dr); > } > references.release (); > ... > > I'm not familiar enough with the dependency analysis code to know where > exactly this should be fixed. ] > > Bootstrapped and reg-tested on x86_64. > > OK for trunk? > > OK for release branches?
Bah - attachement [please post patches inline] So without quote ;) Doesnt' the same issue apply to > unsigned int *p; > > static void __attribute__((noinline, noclone)) > foo (void) > { > unsigned int z; > > for (z = 0; z < N; ++z) > ++(*p); > } thus when we have a MEM_REF[p_1]? SCEV will not analyze its evolution to a POLYNOMIAL_CHREC and thus access_fns will be NULL again. I think avoiding a NULL access_fns is ok but it should be done unconditionally, not only for the DECL_P case. Thanks, Richard.