Hi,
This is a patch set generating new control flow graph for vectorized loop and
its peeling loops. At the moment, CFG for vecorized loop is complicated and
sub-optimal. Major issues are like:
A) For both prologue and vectorized loop, it generates guard/branch before
loops checking if the following (prologue/vectorized) loop should be skipped.
It also generates guard/branch after loops checking if the next loop
(vectorized/epilogue) loop should be skipped.
B) Depending on how conditional set is supported by targets, it may generates
one additional if-statement (branch) setting the niters for prologue loop.
C) In the worst cases, up to 4 branch instructions need to be executed before
vectorized loop is entered.
D) For loops without enough niters, it checks&executes some (niters_prologue)
iterations with prologue loop; then checks if the rest number of iterations
(niters - niters_prologue) is enough for vectorization; if not, it skips
vectorized loop and continues with epilogue loop. This is bad since vectorized
loop won't be executed at all after all the hassle.
This patch set improves it by merging different checks thus only 2 branch
instructions (could be further reduced in combination with loop versioning) are
executed before vectorized loop; it does better in compile time analysis in
order to avoid prologue/epilogue peeling if possible; it improves code
generation in various ways (live overflow handling, generating short live
ranges). In terms of implementation, it tries to factor SSA updating code out
of CFG changing code, I think this may help future work replacing slpeel_* with
generic GIMPLE loop copier.
So far there are 9 patches in the set, patch [1-5] are small prerequisites for
major change which is done by patch 6. Patch [7-9] are small patches either
address test case or improve code generation. Final bootstrap and test of
patch set ongoing on x86_64 and AArch64. Assume no new failure or will be
fixed, any comments on this?
This is the first patch deleting useless code in tree-vect-loop-manip.c, as
well as fixing obvious code style issue.
Thanks,
bin
2016-09-01 Bin Cheng <bin.ch...@arm.com>
* tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code
style issue.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_alignment):
Remove useless code.
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 01d6bb1..3a3b0bc 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1003,9 +1003,9 @@ slpeel_can_duplicate_loop_p (const struct loop *loop,
const_edge e)
gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
unsigned int num_bb = loop->inner? 5 : 2;
- /* All loops have an outer scope; the only case loop->outer is NULL is
for
- the function itself. */
- if (!loop_outer (loop)
+ /* All loops have an outer scope; the only case loop->outer is NULL is for
+ the function itself. */
+ if (!loop_outer (loop)
|| loop->num_nodes != num_bb
|| !empty_block_p (loop->latch)
|| !single_exit (loop)
@@ -1786,7 +1786,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
struct loop *new_loop;
edge update_e;
basic_block preheader;
- int loop_num;
int max_iter;
tree cond_expr = NULL_TREE;
gimple_seq cond_expr_stmt_list = NULL;
@@ -1797,8 +1796,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
initialize_original_copy_tables ();
- loop_num = loop->num;
-
new_loop
= slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
&ratio_mult_vf_name, ni_name, false,
@@ -1806,7 +1803,6 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
cond_expr, cond_expr_stmt_list,
0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
gcc_assert (new_loop);
- gcc_assert (loop_num == loop->num);
slpeel_checking_verify_cfg_after_peeling (loop, new_loop);
/* A guard that controls whether the new_loop is to be executed or skipped
@@ -2053,8 +2049,6 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo,
tree ni_name,
initialize_original_copy_tables ();
- gimple_seq stmts = NULL;
- gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
niters_of_prolog_loop = vect_gen_niters_for_prolog_loop (loop_vinfo,
ni_name,
&bound);