On Wed, Jun 14, 2017 at 11:25 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Jun 14, 2017 at 10:15 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>> Hi, >>>>> During the work I ran into a latent bug for distributing. For the moment >>>>> we sort statements >>>>> in dominance order, but that's not enough because basic blocks may be >>>>> sorted in reverse order >>>>> of execution flow. This results in wrong data dependence direction >>>>> later. This patch fixes >>>>> the issue by sorting in topological order. >>>>> >>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>> >>>> I suppose you are fixing >>>> >>>> static int >>>> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, >>>> vec<data_reference_p> drs1, >>>> vec<data_reference_p> drs2) >>>> { >>>> ... >>>> /* Re-shuffle data-refs to be in dominator order. */ >>>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>>> { >>>> std::swap (dr1, dr2); >>>> this_dir = -this_dir; >>>> } >>>> >>>> but then for stmts that are not "ordered" by RPO or DOM like >>>> >>>> if (flag) >>>> ... = dr1; >>>> else >>>> ... = dr2; >>>> >>>> this doesn't avoid spurious swaps? Also the code was basically >>> No, this is mainly for below case: >>> if (flag) >>> { >>> partition1: arr[i] = x; >>> } >>> partition2: arr[i] = y; >>> >>> function pg_add_dependence_edges is like: >>> /* Re-shuffle data-refs to be in dominator order. */ >>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> //... >>> else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) >>> { >>> if (DDR_REVERSED_P (ddr)) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> /* Known dependences can still be unordered througout the >>> iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ >>> if (DDR_NUM_DIST_VECTS (ddr) != 1) >>> this_dir = 2; >>> /* If the overlap is exact preserve stmt order. */ >>> else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) >>> ; >>> else >>> { >>> /* Else as the distance vector is lexicographic positive >>> swap the dependence direction. */ >>> this_dir = -this_dir; >>> } >>> } >>> For non-ZERO distance vector dependence, the second part always >>> computes src->dest dependence info correctly, as well as edge >>> direction of PG. For ZERO distance vector dependence, we rely on the >>> swap part (thus topological order) to get correct dependence >>> direction. For mentioned case, the two data references are unordered >>> in dominance relation, but ordered in RPO. This is why DOM is not >>> enough. For if-then-else case, the order actually doesn't matter, and >>> the references are unordered in either dominance relation or RPO. >>> Specifically, src->dest is always computed correctly for non-ZERO >>> distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to >>> data dependence analyzer. As for ZERO distance vector (exact >>> overlap), the order doesn't matter either because they control >>> dependent on the same condition. We can simply assume an arbitrary >>> order for it. >> >> Ok, if it only is an issue for zero-distance then yes, I agree. > Yeah, so for distribution, I think we can further simplify > pg_add_dependence_edge because swap is only necessary for > zero-distance cases. >> >>>> copied from tree-data-refs.c:find_data_references_in_loop which >>>> does iterate over get_loop_body_in_dom_order as well. So isn't the >>>> issue latent there as well? >>> In theory maybe. In practice, this is not a problem at all since loop >>> distribution is the only one handles control dependence so far. >> >> You mean the only one running into the bogus BB ordering case. >> I don't see how handling control dependences factor in here. > Yes, that's what I meant. The ordering issue doesn't exist if there > is no control dependence between basic blocks in loop body I think. I > didn't realize the autopar pass. >> >>>> >>>> That said, what about those "unordered" stmts? I suppose >>>> dependence analysis still happily computes a dependence >>>> distance but in reality we'd have to consider both execution >>>> orders? >>> As explained, there is no need to consider both orders. GCC doesn't >>> really support control dependence parallelization? >> >> I think autopar supports an arbitrary CFG inside the loops but as it >> will never split them it won't change stmt ordering for zero-distance. >> >> That said, if dependence info can be incorrect if applied to a loop >> we should fixup tree-data-refs.c as well. It might be useful >> to make get_loop_body_in_rpo_order available then (and eventually >> all _in_dom_order users can work with rpo order as well thus we >> can replace it entirely as a second step). > I can work on that as a followup patch. Is it OK?
Yes. Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >> >>> Thanks, >>> bin >>>> >>>> Thanks, >>>> Richard. >>>> >>>> >>>>> >>>>> Thanks, >>>>> bin >>>>> 2017-06-07 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> * tree-loop-distribution.c (bb_top_order_index): New. >>>>> (bb_top_order_index_size, bb_top_order_cmp): New. >>>>> (stmts_from_loop): Use topological order. >>>>> (pass_loop_distribution::execute): Compute topological order for. >>>>> basic blocks.