On Wed, Oct 11, 2017 at 2:05 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Mon, Oct 9, 2017 at 2:48 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> For now distribution pass only handles the innermost loop. This patch >>> extends the pass >>> to cover two-level innermost loop nest. It also refactors code in >>> pass_loop_distribution::execute >>> for better reading. Note I restrict it to 2-level loop nest on purpose >>> because of high >>> cost in data dependence computation. Some compilation time optimizations >>> like reusing >>> the data reference finding, data dependence computing, would require a >>> rewrite of this >>> pass like the proposed loop interchange implementation. But that's another >>> task. >>> >>> This patch introduces a temporary TODO for loop nest builtin partition >>> which is covered >>> by next two patches. >>> >>> With this patch, kernel loop in bwaves now can be distributed, thus exposed >>> for further >>> interchange. This patch adds new test for matrix multiplication, as well >>> as adjusts >>> test strings of existing tests. >>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK? >> >> @ -714,9 +719,11 @@ ssa_name_has_uses_outside_loop_p (tree def, loop_p loop) >> >> FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def) >> { >> - gimple *use_stmt = USE_STMT (use_p); >> - if (!is_gimple_debug (use_stmt) >> - && loop != loop_containing_stmt (use_stmt)) >> + if (is_gimple_debug (USE_STMT (use_p))) >> + continue; >> + >> + basic_block use_bb = gimple_bb (USE_STMT (use_p)); >> + if (use_bb == NULL || !flow_bb_inside_loop_p (loop, use_bb)) >> return true; >> >> use_bb should never be NULL. > Done. >> >> + /* Don't support loop nest distribution under runtime alias check >> + since it's not likely to enable many vectorization opportunities. >> */ >> + if (loop->inner) >> + { >> + merge_dep_scc_partitions (rdg, &partitions, false); >> + } >> >> extra {} > Done. >> >> + /* Support loop nest distribution enclosing current innermost loop. >> + For the moment, we only support the innermost two-level loop nest. >> */ >> + if (flag_tree_loop_distribution >> + && outer->num > 0 && outer->inner == loop && loop->next == NULL >> >> The canonical check for is-this-non-root is loop_outer (outer) instead >> of outer->num > 0. > Done. >> >> + && single_exit (outer) >> >> not sure how exits are counted but if the inner loop exits also the >> outer loop do >> we correctly handle/reject this case? > I tend to believe this can be handled if it's not rejected by > niters/exit condition, > but I am not very sure about this. >> >> - if (nb_generated_loops + nb_generated_calls > 0) >> - { >> - changed = true; >> - dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >> - loc, "Loop %d distributed: split to %d loops " >> - "and %d library calls.\n", >> - num, nb_generated_loops, nb_generated_calls); >> + if (nb_generated_loops + nb_generated_calls > 0) >> + { >> + changed = true; >> + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >> + loc, "Loop%s %d distributed: split to %d >> loops " >> + "and %d library calls.\n", >> + loop_nest_p ? " nest" : "", loop->num, >> + nb_generated_loops, nb_generated_call >> ... >> >> can you adjust the printfs to say "loop nest distributed" in case we >> distributed >> a nest? > Done. >> >> Can you rewrite the iteration over the nest so it would theoretically support >> arbitrary deep perfect nests? Thus simply initialize loop_nest_p less >> cleverly... > Done. I factored it out as a function "prepare_perfect_loop_nest". I > also tested > the updated patch by enabling full loop nest distribution, there is no failure > in bootstrap, regression test, spec benchmarks. Of course, the final patch > still only supports 2-level innermost loop nest. > > Is this OK?
Ok. Thanks, Richard. > Thanks, > bin > 2017-10-04 Bin Cheng <bin.ch...@arm.com> > > * tree-loop-distribution.c: Adjust the general comment. > (NUM_PARTITION_THRESHOLD): New macro. > (ssa_name_has_uses_outside_loop_p): Support loop nest distribution. > (classify_partition): Skip builtin pattern of loop nest's inner loop. > (merge_dep_scc_partitions): New parameter ignore_alias_p and use it > in call to build_partition_graph. > (finalize_partitions): New parameter. Make loop distribution more > conservative by fusing more partitions. > (distribute_loop): Don't do runtime alias check in case of loop nest > distribution. > (find_seed_stmts_for_distribution): New function. > (prepare_perfect_loop_nest): New function. > (pass_loop_distribution::execute): Refactor code finding seed stmts > and loop nest into above functions. Support loop nest distribution. > Adjust dump information accordingly. > > gcc/testsuite/ChangeLog > 2017-10-04 Bin Cheng <bin.ch...@arm.com> > > * gcc.dg/tree-ssa/ldist-7.c: Adjust test string. > * gcc.dg/tree-ssa/ldist-16.c: Ditto. > * gcc.dg/tree-ssa/ldist-25.c: Ditto. > * gcc.dg/tree-ssa/ldist-33.c: New test.