On Wed, Jun 14, 2017 at 10:15 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>> Hi, >>>> During the work I ran into a latent bug for distributing. For the moment >>>> we sort statements >>>> in dominance order, but that's not enough because basic blocks may be >>>> sorted in reverse order >>>> of execution flow. This results in wrong data dependence direction later. >>>> This patch fixes >>>> the issue by sorting in topological order. >>>> >>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> I suppose you are fixing >>> >>> static int >>> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, >>> vec<data_reference_p> drs1, >>> vec<data_reference_p> drs2) >>> { >>> ... >>> /* Re-shuffle data-refs to be in dominator order. */ >>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> >>> but then for stmts that are not "ordered" by RPO or DOM like >>> >>> if (flag) >>> ... = dr1; >>> else >>> ... = dr2; >>> >>> this doesn't avoid spurious swaps? Also the code was basically >> No, this is mainly for below case: >> if (flag) >> { >> partition1: arr[i] = x; >> } >> partition2: arr[i] = y; >> >> function pg_add_dependence_edges is like: >> /* Re-shuffle data-refs to be in dominator order. */ >> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> //... >> else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) >> { >> if (DDR_REVERSED_P (ddr)) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> /* Known dependences can still be unordered througout the >> iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ >> if (DDR_NUM_DIST_VECTS (ddr) != 1) >> this_dir = 2; >> /* If the overlap is exact preserve stmt order. */ >> else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) >> ; >> else >> { >> /* Else as the distance vector is lexicographic positive >> swap the dependence direction. */ >> this_dir = -this_dir; >> } >> } >> For non-ZERO distance vector dependence, the second part always >> computes src->dest dependence info correctly, as well as edge >> direction of PG. For ZERO distance vector dependence, we rely on the >> swap part (thus topological order) to get correct dependence >> direction. For mentioned case, the two data references are unordered >> in dominance relation, but ordered in RPO. This is why DOM is not >> enough. For if-then-else case, the order actually doesn't matter, and >> the references are unordered in either dominance relation or RPO. >> Specifically, src->dest is always computed correctly for non-ZERO >> distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to >> data dependence analyzer. As for ZERO distance vector (exact >> overlap), the order doesn't matter either because they control >> dependent on the same condition. We can simply assume an arbitrary >> order for it. > > Ok, if it only is an issue for zero-distance then yes, I agree. Yeah, so for distribution, I think we can further simplify pg_add_dependence_edge because swap is only necessary for zero-distance cases. > >>> copied from tree-data-refs.c:find_data_references_in_loop which >>> does iterate over get_loop_body_in_dom_order as well. So isn't the >>> issue latent there as well? >> In theory maybe. In practice, this is not a problem at all since loop >> distribution is the only one handles control dependence so far. > > You mean the only one running into the bogus BB ordering case. > I don't see how handling control dependences factor in here. Yes, that's what I meant. The ordering issue doesn't exist if there is no control dependence between basic blocks in loop body I think. I didn't realize the autopar pass. > >>> >>> That said, what about those "unordered" stmts? I suppose >>> dependence analysis still happily computes a dependence >>> distance but in reality we'd have to consider both execution >>> orders? >> As explained, there is no need to consider both orders. GCC doesn't >> really support control dependence parallelization? > > I think autopar supports an arbitrary CFG inside the loops but as it > will never split them it won't change stmt ordering for zero-distance. > > That said, if dependence info can be incorrect if applied to a loop > we should fixup tree-data-refs.c as well. It might be useful > to make get_loop_body_in_rpo_order available then (and eventually > all _in_dom_order users can work with rpo order as well thus we > can replace it entirely as a second step). I can work on that as a followup patch. Is it OK?
Thanks, bin > > Thanks, > Richard. > >> Thanks, >> bin >>> >>> Thanks, >>> Richard. >>> >>> >>>> >>>> Thanks, >>>> bin >>>> 2017-06-07 Bin Cheng <bin.ch...@arm.com> >>>> >>>> * tree-loop-distribution.c (bb_top_order_index): New. >>>> (bb_top_order_index_size, bb_top_order_cmp): New. >>>> (stmts_from_loop): Use topological order. >>>> (pass_loop_distribution::execute): Compute topological order for. >>>> basic blocks.