On Wed, Oct 16, 2013 at 2:02 AM, Richard Biener <rguent...@suse.de> wrote: > On Tue, 15 Oct 2013, Cong Hou wrote: > >> Thank you for your reminder, Jeff! I just noticed Richard's comment. I >> have modified the patch according to that. >> >> The new patch is attached. > > (posting patches inline is easier for review, now you have to deal > with no quoting markers ;)) > > Comments inline. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 8a38316..2637309 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,8 @@ > +2013-10-15 Cong Hou <co...@google.com> > + > + * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop invariant > + statement that contains data refs with zero-step. > + > 2013-10-14 David Malcolm <dmalc...@redhat.com> > > * dumpfile.h (gcc::dump_manager): New class, to hold state > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 075d071..9d0f4a5 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,7 @@ > +2013-10-15 Cong Hou <co...@google.com> > + > + * gcc.dg/vect/pr58508.c: New test. > + > 2013-10-14 Tobias Burnus <bur...@net-b.de> > > PR fortran/58658 > diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c > b/gcc/testsuite/gcc.dg/vect/pr58508.c > new file mode 100644 > index 0000000..cb22b50 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > + > + > +/* The GCC vectorizer generates loop versioning for the following loop > + since there may exist aliasing between A and B. The predicate checks > + if A may alias with B across all iterations. Then for the loop in > + the true body, we can assert that *B is a loop invariant so that > + we can hoist the load of *B before the loop body. */ > + > +void foo (int* a, int* b) > +{ > + int i; > + for (i = 0; i < 100000; ++i) > + a[i] = *b + 1; > +} > + > + > +/* { dg-final { scan-tree-dump-times "hoist" 2 "vect" } } */ > +/* { dg-final { cleanup-tree-dump "vect" } } */ > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > index 574446a..f4fdec2 100644 > --- a/gcc/tree-vect-loop-manip.c > +++ b/gcc/tree-vect-loop-manip.c > @@ -2477,6 +2477,92 @@ vect_loop_versioning (loop_vec_info loop_vinfo, > adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); > } > > > Note that applying this kind of transform at this point invalidates > some of the earlier analysis the vectorizer performed (namely the > def-kind which now effectively gets vect_external_def from > vect_internal_def). In this case it doesn't seem to cause any > issues (we re-compute the def-kind everytime we need it (how wasteful)). > > + /* Extract load and store statements on pointers with zero-stride > + accesses. */ > + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) > + { > + /* In the loop body, we iterate each statement to check if it is a load > + or store. Then we check the DR_STEP of the data reference. If > + DR_STEP is zero, then we will hoist the load statement to the loop > + preheader, and move the store statement to the loop exit. */ > > We don't move the store yet. Micha has a patch pending that enables > vectorization of zero-step stores. > > + for (gimple_stmt_iterator si = gsi_start_bb (loop->header); > + !gsi_end_p (si);) > > While technically ok now (vectorized loops contain a single basic block) > please use LOOP_VINFO_BBS () to get at the vector of basic-blcoks > and iterate over them like other code does.
Have done it. > > + { > + gimple stmt = gsi_stmt (si); > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); > + > + if (dr && integer_zerop (DR_STEP (dr))) > + { > + if (DR_IS_READ (dr)) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc > + (MSG_NOTE, vect_location, > + "hoist the statement to outside of the loop "); > > "hoisting out of the vectorized loop: " > > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); > + dump_printf (MSG_NOTE, "\n"); > + } > + > + gsi_remove (&si, false); > + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), > stmt); > > Note that this will result in a bogus VUSE on the stmt at this point which > will be only fixed because of implementation details of loop versioning. > Either get the correct VUSE from the loop header virtual PHI node > preheader edge (if there is none then the current VUSE is the correct one > to use) or clear it. I just cleared the VUSE since I noticed that after the vectorization pass the correct VUSE is reassigned to the load. > > + } > + /* TODO: We also consider vectorizing loops containing zero-step > + data refs as writes. For example: > + > + int a[N], *s; > + for (i = 0; i < N; i++) > + *s += a[i]; > + > + In this case the write to *s can be also moved after the > + loop. */ > > Note that you then invalidate even more vectorizer analysis - you > basically introduce a scalar reduction (you have to add a PHI node). > Which means that the transform has to happen elsewhere. > > As Jakub now tries with if-conversion this would also be a candidate > for applying the loop versioning before even starting the rest of the > vectorizer analysis code. > > That said, I'd remove the TODO at this point because it's clearly not > possible to implement just here ;) Yes. This comment is removed. > > + continue; > + } > + else if (!dr) > + { > + bool hoist = true; > + for (size_t i = 0; i < gimple_num_ops (stmt); i++) > > You are checking all kinds of statements, including assignments > of which you are also checking the LHS ... restricting to > assignments you can start walking at i = 1. Done. > > + { > + tree op = gimple_op (stmt, i); > + if (TREE_CODE (op) == INTEGER_CST > + || TREE_CODE (op) == REAL_CST) > > There are other constants as well - just check > > if (is_gimple_min_invariant (op)) > > + continue; > + if (TREE_CODE (op) == SSA_NAME) > + { > + gimple def = SSA_NAME_DEF_STMT (op); > + if (def == stmt > > with starting at op 1 you can drop this. > > + || gimple_nop_p (def) > + || !flow_bb_inside_loop_p (loop, gimple_bb (def))) > + continue; > + } > > Note that you fail to hoist if-converted code this way because > op1 of > > name_1 = a_2 < b_4 ? x_5 : y_6; > > is 'a_2 < b_4', a tree expression and not an SSA name (ugh). Maybe > we don't care (it's just a missed optimization), but if you are > restricting yourself to hoisting assignments without a data-ref > then you can walk over SSA uses on the stmt (instead of over gimple > ops) with > > FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_USE) > > which would automagically take care of that case. I used FOR_EACH_SSA_TREE_OPERAND() here, and then I don't have to deal with different constant types. > > Can you add a testcase which involves invariant if-conversion? > Can you add a testcase with just an invariant store to make sure > we don't wreck it? Can you add a testcase with an invariant store > and load (the reduction case), too? The if-conversion case is added. But the current vectorizer will not vectorize loops with stores to zero-stride memory access (vect_analyze_loop() fails in this case). Are you sure to add the testcase with this? (I still added two tests for those two cases). > > + hoist = false; > + break; > + } > > For example you'll hoist all labels this way (no ops), as well as the > loop exit GIMPLE_COND in case it's operands were loop invariant, > breaking the CFG ... so please add && is_gimple_assign () to the > if (!dr) check ;) Done. I appreciate your detailed comments! The new patch is pasted below (since tabs cannot show here I also attached a text file with the patch including tabs). thanks, Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..2637309 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-10-15 Cong Hou <co...@google.com> + + * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop invariant + statement that contains data refs with zero-step. + 2013-10-14 David Malcolm <dmalc...@redhat.com> * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..9d0f4a5 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-15 Cong Hou <co...@google.com> + + * gcc.dg/vect/pr58508.c: New test. + 2013-10-14 Tobias Burnus <bur...@net-b.de> PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c b/gcc/testsuite/gcc.dg/vect/pr58508.c new file mode 100644 index 0000000..a3f6a06 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ + + +/* The GCC vectorizer generates loop versioning for the following loop + since there may exist aliasing between A and B. The predicate checks + if A may alias with B across all iterations. Then for the loop in + the true body, we can assert that *B is a loop invariant so that + we can hoist the load of *B before the loop body. */ + +void test1 (int* a, int* b) +{ + int i; + for (i = 0; i < 100000; ++i) + a[i] = *b + 1; +} + + +/* A test case with ifcvt transformation. */ + +void test2 (int* a, int* b) +{ + int i, t; + for (i = 0; i < 10000; ++i) + { + if (*b > 0) + t = *b * 2; + else + t = *b / 2; + a[i] = t; + } +} + +/* A test case in which the store in the loop can be moved outside + in the versioned loop with alias checks. Note this loop won't + be vectorized. */ + +void test3 (int* a, int* b) +{ + int i; + for (i = 0; i < 100000; ++i) + *a += b[i]; +} + +/* A test case in which the load and store in the loop to b + can be moved outside in the versioned loop with alias checks. + Note this loop won't be vectorized. */ + +void test4 (int* a, int* b) +{ + int i; + for (i = 0; i < 100000; ++i) + { + *b += a[i]; + a[i] = *b; + } +} + +/* { dg-final { scan-tree-dump-times "hoist" 6 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..ff0403b 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2477,6 +2477,88 @@ vect_loop_versioning (loop_vec_info loop_vinfo, adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); } + + /* Extract load statements on memrefs with zero-stride accesses. */ + + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) + { + /* In the loop body, we iterate each statement to check if it is a load. + Then we check the DR_STEP of the data reference. If DR_STEP is zero, + then we will hoist the load statement to the loop preheader. */ + + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); + int nbbs = loop->num_nodes; + + for (int i = 0; i < nbbs; ++i) + { + for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]); + !gsi_end_p (si);) + { + gimple stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + + if (dr && integer_zerop (DR_STEP (dr))) + { + if (DR_IS_READ (dr)) + { + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + "hoisting out of the vectorized loop: "); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + dump_printf (MSG_NOTE, "\n"); + } + + gimple_set_vuse (stmt, NULL); + gsi_remove (&si, false); + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), + stmt); + } + continue; + } + else if (!dr && is_gimple_assign (stmt)) + { + bool hoist = true; + ssa_op_iter iter; + tree var; + + /* We hoist a statement if all SSA uses in it are defined + outside of the loop. */ + FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE) + { + gimple def = SSA_NAME_DEF_STMT (var); + if (!gimple_nop_p (def) + && flow_bb_inside_loop_p (loop, gimple_bb (def))) + { + hoist = false; + break; + } + } + + if (hoist) + { + gsi_remove (&si, false); + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), + stmt); + + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + "hoisting out of the vectorized loop: "); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + dump_printf (MSG_NOTE, "\n"); + } + continue; + } + } + gsi_next (&si); + } + } + } + /* End loop-exit-fixes after versioning. */ if (cond_expr_stmt_list) > > + if (hoist) > + { > + gsi_remove (&si, false); > + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), > stmt); > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc > + (MSG_NOTE, vect_location, > + "hoist the statement to outside of the loop "); > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); > + dump_printf (MSG_NOTE, "\n"); > + } > + continue; > + } > + } > + gsi_next (&si); > + } > + } > + > /* End loop-exit-fixes after versioning. */ > > if (cond_expr_stmt_list) > >> >> thanks, >> Cong >> >> >> On Tue, Oct 15, 2013 at 12:33 PM, Jeff Law <l...@redhat.com> wrote: >> > On 10/14/13 17:31, Cong Hou wrote: >> >> >> >> Any comment on this patch? >> > >> > Richi replied in the BZ you opened. >> > >> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58508 >> > >> > Essentially he said emit the load on the edge rather than in the block >> > itself. >> > jeff >> >
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..2637309 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-10-15 Cong Hou <co...@google.com> + + * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop invariant + statement that contains data refs with zero-step. + 2013-10-14 David Malcolm <dmalc...@redhat.com> * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..9d0f4a5 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-15 Cong Hou <co...@google.com> + + * gcc.dg/vect/pr58508.c: New test. + 2013-10-14 Tobias Burnus <bur...@net-b.de> PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c b/gcc/testsuite/gcc.dg/vect/pr58508.c new file mode 100644 index 0000000..a3f6a06 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ + + +/* The GCC vectorizer generates loop versioning for the following loop + since there may exist aliasing between A and B. The predicate checks + if A may alias with B across all iterations. Then for the loop in + the true body, we can assert that *B is a loop invariant so that + we can hoist the load of *B before the loop body. */ + +void test1 (int* a, int* b) +{ + int i; + for (i = 0; i < 100000; ++i) + a[i] = *b + 1; +} + + +/* A test case with ifcvt transformation. */ + +void test2 (int* a, int* b) +{ + int i, t; + for (i = 0; i < 10000; ++i) + { + if (*b > 0) + t = *b * 2; + else + t = *b / 2; + a[i] = t; + } +} + +/* A test case in which the store in the loop can be moved outside + in the versioned loop with alias checks. Note this loop won't + be vectorized. */ + +void test3 (int* a, int* b) +{ + int i; + for (i = 0; i < 100000; ++i) + *a += b[i]; +} + +/* A test case in which the load and store in the loop to b + can be moved outside in the versioned loop with alias checks. + Note this loop won't be vectorized. */ + +void test4 (int* a, int* b) +{ + int i; + for (i = 0; i < 100000; ++i) + { + *b += a[i]; + a[i] = *b; + } +} + +/* { dg-final { scan-tree-dump-times "hoist" 6 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..ff0403b 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2477,6 +2477,88 @@ vect_loop_versioning (loop_vec_info loop_vinfo, adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); } + + /* Extract load statements on memrefs with zero-stride accesses. */ + + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) + { + /* In the loop body, we iterate each statement to check if it is a load. + Then we check the DR_STEP of the data reference. If DR_STEP is zero, + then we will hoist the load statement to the loop preheader. */ + + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); + int nbbs = loop->num_nodes; + + for (int i = 0; i < nbbs; ++i) + { + for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]); + !gsi_end_p (si);) + { + gimple stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + + if (dr && integer_zerop (DR_STEP (dr))) + { + if (DR_IS_READ (dr)) + { + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + "hoisting out of the vectorized loop: "); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + dump_printf (MSG_NOTE, "\n"); + } + + gimple_set_vuse (stmt, NULL); + gsi_remove (&si, false); + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), + stmt); + } + continue; + } + else if (!dr && is_gimple_assign (stmt)) + { + bool hoist = true; + ssa_op_iter iter; + tree var; + + /* We hoist a statement if all SSA uses in it are defined + outside of the loop. */ + FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE) + { + gimple def = SSA_NAME_DEF_STMT (var); + if (!gimple_nop_p (def) + && flow_bb_inside_loop_p (loop, gimple_bb (def))) + { + hoist = false; + break; + } + } + + if (hoist) + { + gsi_remove (&si, false); + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), + stmt); + + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + "hoisting out of the vectorized loop: "); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + dump_printf (MSG_NOTE, "\n"); + } + continue; + } + } + gsi_next (&si); + } + } + } + /* End loop-exit-fixes after versioning. */ if (cond_expr_stmt_list)