On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> This is the main patch improving control flow graph for vectorized loop.  It 
> generally rewrites loop peeling stuff in vectorizer.  As described in patch, 
> for a typical loop to be vectorized like:
>
>        preheader:
>      LOOP:
>        header_bb:
>          loop_body
>          if (exit_loop_cond) goto exit_bb
>          else                goto header_bb
>        exit_bb:
>
> This patch peels prolog and epilog from the loop, adds guards skipping PROLOG 
> and EPILOG for various conditions.  As a result, the changed CFG would look 
> like:
>
>        guard_bb_1:
>          if (prefer_scalar_loop) goto merge_bb_1
>          else                    goto guard_bb_2
>
>        guard_bb_2:
>          if (skip_prolog) goto merge_bb_2
>          else             goto prolog_preheader
>
>        prolog_preheader:
>      PROLOG:
>        prolog_header_bb:
>          prolog_body
>          if (exit_prolog_cond) goto prolog_exit_bb
>          else                  goto prolog_header_bb
>        prolog_exit_bb:
>
>        merge_bb_2:
>
>        vector_preheader:
>      VECTOR LOOP:
>        vector_header_bb:
>          vector_body
>          if (exit_vector_cond) goto vector_exit_bb
>          else                  goto vector_header_bb
>        vector_exit_bb:
>
>        guard_bb_3:
>          if (skip_epilog) goto merge_bb_3
>          else             goto epilog_preheader
>
>        merge_bb_1:
>
>        epilog_preheader:
>      EPILOG:
>        epilog_header_bb:
>          epilog_body
>          if (exit_epilog_cond) goto merge_bb_3
>          else                  goto epilog_header_bb
>
>        merge_bb_3:
>
>
> Note this patch peels prolog and epilog only if it's necessary, as well as 
> adds different guard_conditions/branches.  Also the first guard/branch could 
> be further improved by merging it with loop versioning.
>
> Before this patch, up to 4 branch instructions need to be executed before the 
> vectorized loop is reached in the worst case, while the number is reduced to 
> 2 with this patch.  The patch also does better in compile time analysis to 
> avoid unnecessary peeling/branching.
> From implementation's point of view, vectorizer needs to update induction 
> variables and iteration bounds along with control flow changes.  
> Unfortunately, it also becomes much harder to follow because slpeel_* 
> functions updates SSA by itself, rather than using update_ssa interface.  
> This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes.  
> This should make the implementation easier to read, and I think it maybe a 
> step forward to replace slpeel_* functions with generic GIMPLE loop copy 
> interfaces as Richard suggested.

I've skimmed over the patch and it looks reasonable to me.

Ok.

Thanks,
Richard.


> Thanks,
> bin
>
> 2016-09-01  Bin Cheng  <bin.ch...@arm.com>
>
>         * tree-vect-loop-manip.c (adjust_vec_debug_stmts): Don't release
>         adjust_vec automatically.
>         (slpeel_add_loop_guard): Remove param cond_expr_stmt_list.  Rename
>         param exit_bb to guard_to.
>         (slpeel_checking_verify_cfg_after_peeling):
>         (set_prologue_iterations):
>         (create_lcssa_for_virtual_phi): New func which is factored out from
>         slpeel_tree_peel_loop_to_edge.
>         (slpeel_tree_peel_loop_to_edge):
>         (iv_phi_p): New func.
>         (vect_can_advance_ivs_p): Call iv_phi_p.
>         (vect_update_ivs_after_vectorizer): Call iv_phi_p.  Directly insert
>         new gimple stmts in basic block.
>         (vect_do_peeling_for_loop_bound):
>         (vect_do_peeling_for_alignment):
>         (vect_gen_niters_for_prolog_loop): Rename to...
>         (vect_gen_prolog_loop_niters): ...Rename from.  Change parameters and
>         adjust implementation.
>         (vect_update_inits_of_drs): Fix code style issue.  Convert niters to
>         sizetype if necessary.
>         (vect_build_loop_niters): Move to here from tree-vect-loop.c.  Change
>         it to external function.
>         (vect_gen_scalar_loop_niters, vect_gen_vector_loop_niters): New.
>         (vect_gen_vector_loop_niters_mult_vf): New.
>         (slpeel_update_phi_nodes_for_loops): New.
>         (slpeel_update_phi_nodes_for_guard1): Reimplement.
>         (find_guard_arg, slpeel_update_phi_nodes_for_guard2): Reimplement.
>         (slpeel_update_phi_nodes_for_lcssa, vect_do_peeling): New.
>         * tree-vect-loop.c (vect_build_loop_niters): Move to file
>         tree-vect-loop-manip.c
>         (vect_generate_tmps_on_preheader): Delete.
>         (vect_transform_loop): Rename vectorization_factor to vf.  Call
>         vect_do_peeling instead of vect_do_peeling-* functions.
>         * tree-vectorizer.h (vect_do_peeling): New decl.
>         (vect_build_loop_niters, vect_gen_vector_loop_niters): New decls.
>         (vect_do_peeling_for_loop_bound): Delete.
>         (vect_do_peeling_for_alignment): Delete.

Reply via email to