On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > This is the main patch improving control flow graph for vectorized loop. It > generally rewrites loop peeling stuff in vectorizer. As described in patch, > for a typical loop to be vectorized like: > > preheader: > LOOP: > header_bb: > loop_body > if (exit_loop_cond) goto exit_bb > else goto header_bb > exit_bb: > > This patch peels prolog and epilog from the loop, adds guards skipping PROLOG > and EPILOG for various conditions. As a result, the changed CFG would look > like: > > guard_bb_1: > if (prefer_scalar_loop) goto merge_bb_1 > else goto guard_bb_2 > > guard_bb_2: > if (skip_prolog) goto merge_bb_2 > else goto prolog_preheader > > prolog_preheader: > PROLOG: > prolog_header_bb: > prolog_body > if (exit_prolog_cond) goto prolog_exit_bb > else goto prolog_header_bb > prolog_exit_bb: > > merge_bb_2: > > vector_preheader: > VECTOR LOOP: > vector_header_bb: > vector_body > if (exit_vector_cond) goto vector_exit_bb > else goto vector_header_bb > vector_exit_bb: > > guard_bb_3: > if (skip_epilog) goto merge_bb_3 > else goto epilog_preheader > > merge_bb_1: > > epilog_preheader: > EPILOG: > epilog_header_bb: > epilog_body > if (exit_epilog_cond) goto merge_bb_3 > else goto epilog_header_bb > > merge_bb_3: > > > Note this patch peels prolog and epilog only if it's necessary, as well as > adds different guard_conditions/branches. Also the first guard/branch could > be further improved by merging it with loop versioning. > > Before this patch, up to 4 branch instructions need to be executed before the > vectorized loop is reached in the worst case, while the number is reduced to > 2 with this patch. The patch also does better in compile time analysis to > avoid unnecessary peeling/branching. > From implementation's point of view, vectorizer needs to update induction > variables and iteration bounds along with control flow changes. > Unfortunately, it also becomes much harder to follow because slpeel_* > functions updates SSA by itself, rather than using update_ssa interface. > This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes. > This should make the implementation easier to read, and I think it maybe a > step forward to replace slpeel_* functions with generic GIMPLE loop copy > interfaces as Richard suggested.
I've skimmed over the patch and it looks reasonable to me. Ok. Thanks, Richard. > Thanks, > bin > > 2016-09-01 Bin Cheng <bin.ch...@arm.com> > > * tree-vect-loop-manip.c (adjust_vec_debug_stmts): Don't release > adjust_vec automatically. > (slpeel_add_loop_guard): Remove param cond_expr_stmt_list. Rename > param exit_bb to guard_to. > (slpeel_checking_verify_cfg_after_peeling): > (set_prologue_iterations): > (create_lcssa_for_virtual_phi): New func which is factored out from > slpeel_tree_peel_loop_to_edge. > (slpeel_tree_peel_loop_to_edge): > (iv_phi_p): New func. > (vect_can_advance_ivs_p): Call iv_phi_p. > (vect_update_ivs_after_vectorizer): Call iv_phi_p. Directly insert > new gimple stmts in basic block. > (vect_do_peeling_for_loop_bound): > (vect_do_peeling_for_alignment): > (vect_gen_niters_for_prolog_loop): Rename to... > (vect_gen_prolog_loop_niters): ...Rename from. Change parameters and > adjust implementation. > (vect_update_inits_of_drs): Fix code style issue. Convert niters to > sizetype if necessary. > (vect_build_loop_niters): Move to here from tree-vect-loop.c. Change > it to external function. > (vect_gen_scalar_loop_niters, vect_gen_vector_loop_niters): New. > (vect_gen_vector_loop_niters_mult_vf): New. > (slpeel_update_phi_nodes_for_loops): New. > (slpeel_update_phi_nodes_for_guard1): Reimplement. > (find_guard_arg, slpeel_update_phi_nodes_for_guard2): Reimplement. > (slpeel_update_phi_nodes_for_lcssa, vect_do_peeling): New. > * tree-vect-loop.c (vect_build_loop_niters): Move to file > tree-vect-loop-manip.c > (vect_generate_tmps_on_preheader): Delete. > (vect_transform_loop): Rename vectorization_factor to vf. Call > vect_do_peeling instead of vect_do_peeling-* functions. > * tree-vectorizer.h (vect_do_peeling): New decl. > (vect_build_loop_niters, vect_gen_vector_loop_niters): New decls. > (vect_do_peeling_for_loop_bound): Delete. > (vect_do_peeling_for_alignment): Delete.