On Thu, 5 Dec 2013, Cong Hou wrote: > Hi Richard > > You mentioned that Micha has a patch pending that enables of zero-step > stores. What is the status of this patch? I could not find it through > searching "Micha".
Well, it's pending on his harddisk :/ Too late for 4.9. Richard. > Thank you! > > > Cong > > > On Wed, Oct 16, 2013 at 2:02 AM, Richard Biener <rguent...@suse.de> wrote: > > On Tue, 15 Oct 2013, Cong Hou wrote: > > > >> Thank you for your reminder, Jeff! I just noticed Richard's comment. I > >> have modified the patch according to that. > >> > >> The new patch is attached. > > > > (posting patches inline is easier for review, now you have to deal > > with no quoting markers ;)) > > > > Comments inline. > > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > > index 8a38316..2637309 100644 > > --- a/gcc/ChangeLog > > +++ b/gcc/ChangeLog > > @@ -1,3 +1,8 @@ > > +2013-10-15 Cong Hou <co...@google.com> > > + > > + * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop > > invariant > > + statement that contains data refs with zero-step. > > + > > 2013-10-14 David Malcolm <dmalc...@redhat.com> > > > > * dumpfile.h (gcc::dump_manager): New class, to hold state > > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > > index 075d071..9d0f4a5 100644 > > --- a/gcc/testsuite/ChangeLog > > +++ b/gcc/testsuite/ChangeLog > > @@ -1,3 +1,7 @@ > > +2013-10-15 Cong Hou <co...@google.com> > > + > > + * gcc.dg/vect/pr58508.c: New test. > > + > > 2013-10-14 Tobias Burnus <bur...@net-b.de> > > > > PR fortran/58658 > > diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c > > b/gcc/testsuite/gcc.dg/vect/pr58508.c > > new file mode 100644 > > index 0000000..cb22b50 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c > > @@ -0,0 +1,20 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > > + > > + > > +/* The GCC vectorizer generates loop versioning for the following loop > > + since there may exist aliasing between A and B. The predicate checks > > + if A may alias with B across all iterations. Then for the loop in > > + the true body, we can assert that *B is a loop invariant so that > > + we can hoist the load of *B before the loop body. */ > > + > > +void foo (int* a, int* b) > > +{ > > + int i; > > + for (i = 0; i < 100000; ++i) > > + a[i] = *b + 1; > > +} > > + > > + > > +/* { dg-final { scan-tree-dump-times "hoist" 2 "vect" } } */ > > +/* { dg-final { cleanup-tree-dump "vect" } } */ > > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > > index 574446a..f4fdec2 100644 > > --- a/gcc/tree-vect-loop-manip.c > > +++ b/gcc/tree-vect-loop-manip.c > > @@ -2477,6 +2477,92 @@ vect_loop_versioning (loop_vec_info loop_vinfo, > > adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); > > } > > > > > > Note that applying this kind of transform at this point invalidates > > some of the earlier analysis the vectorizer performed (namely the > > def-kind which now effectively gets vect_external_def from > > vect_internal_def). In this case it doesn't seem to cause any > > issues (we re-compute the def-kind everytime we need it (how wasteful)). > > > > + /* Extract load and store statements on pointers with zero-stride > > + accesses. */ > > + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) > > + { > > + /* In the loop body, we iterate each statement to check if it is a > > load > > + or store. Then we check the DR_STEP of the data reference. If > > + DR_STEP is zero, then we will hoist the load statement to the loop > > + preheader, and move the store statement to the loop exit. */ > > > > We don't move the store yet. Micha has a patch pending that enables > > vectorization of zero-step stores. > > > > + for (gimple_stmt_iterator si = gsi_start_bb (loop->header); > > + !gsi_end_p (si);) > > > > While technically ok now (vectorized loops contain a single basic block) > > please use LOOP_VINFO_BBS () to get at the vector of basic-blcoks > > and iterate over them like other code does. > > > > + { > > + gimple stmt = gsi_stmt (si); > > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > > + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); > > + > > + if (dr && integer_zerop (DR_STEP (dr))) > > + { > > + if (DR_IS_READ (dr)) > > + { > > + if (dump_enabled_p ()) > > + { > > + dump_printf_loc > > + (MSG_NOTE, vect_location, > > + "hoist the statement to outside of the loop "); > > > > "hoisting out of the vectorized loop: " > > > > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); > > + dump_printf (MSG_NOTE, "\n"); > > + } > > + > > + gsi_remove (&si, false); > > + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), > > stmt); > > > > Note that this will result in a bogus VUSE on the stmt at this point which > > will be only fixed because of implementation details of loop versioning. > > Either get the correct VUSE from the loop header virtual PHI node > > preheader edge (if there is none then the current VUSE is the correct one > > to use) or clear it. > > > > + } > > + /* TODO: We also consider vectorizing loops containing > > zero-step > > + data refs as writes. For example: > > + > > + int a[N], *s; > > + for (i = 0; i < N; i++) > > + *s += a[i]; > > + > > + In this case the write to *s can be also moved after the > > + loop. */ > > > > Note that you then invalidate even more vectorizer analysis - you > > basically introduce a scalar reduction (you have to add a PHI node). > > Which means that the transform has to happen elsewhere. > > > > As Jakub now tries with if-conversion this would also be a candidate > > for applying the loop versioning before even starting the rest of the > > vectorizer analysis code. > > > > That said, I'd remove the TODO at this point because it's clearly not > > possible to implement just here ;) > > > > + continue; > > + } > > + else if (!dr) > > + { > > + bool hoist = true; > > + for (size_t i = 0; i < gimple_num_ops (stmt); i++) > > > > You are checking all kinds of statements, including assignments > > of which you are also checking the LHS ... restricting to > > assignments you can start walking at i = 1. > > > > + { > > + tree op = gimple_op (stmt, i); > > + if (TREE_CODE (op) == INTEGER_CST > > + || TREE_CODE (op) == REAL_CST) > > > > There are other constants as well - just check > > > > if (is_gimple_min_invariant (op)) > > > > + continue; > > + if (TREE_CODE (op) == SSA_NAME) > > + { > > + gimple def = SSA_NAME_DEF_STMT (op); > > + if (def == stmt > > > > with starting at op 1 you can drop this. > > > > + || gimple_nop_p (def) > > + || !flow_bb_inside_loop_p (loop, gimple_bb (def))) > > + continue; > > + } > > > > Note that you fail to hoist if-converted code this way because > > op1 of > > > > name_1 = a_2 < b_4 ? x_5 : y_6; > > > > is 'a_2 < b_4', a tree expression and not an SSA name (ugh). Maybe > > we don't care (it's just a missed optimization), but if you are > > restricting yourself to hoisting assignments without a data-ref > > then you can walk over SSA uses on the stmt (instead of over gimple > > ops) with > > > > FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_USE) > > > > which would automagically take care of that case. > > > > Can you add a testcase which involves invariant if-conversion? > > Can you add a testcase with just an invariant store to make sure > > we don't wreck it? Can you add a testcase with an invariant store > > and load (the reduction case), too? > > > > + hoist = false; > > + break; > > + } > > > > For example you'll hoist all labels this way (no ops), as well as the > > loop exit GIMPLE_COND in case it's operands were loop invariant, > > breaking the CFG ... so please add && is_gimple_assign () to the > > if (!dr) check ;) > > > > + if (hoist) > > + { > > + gsi_remove (&si, false); > > + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), > > stmt); > > + > > + if (dump_enabled_p ()) > > + { > > + dump_printf_loc > > + (MSG_NOTE, vect_location, > > + "hoist the statement to outside of the loop "); > > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); > > + dump_printf (MSG_NOTE, "\n"); > > + } > > + continue; > > + } > > + } > > + gsi_next (&si); > > + } > > + } > > + > > /* End loop-exit-fixes after versioning. */ > > > > if (cond_expr_stmt_list) > > > >> > >> thanks, > >> Cong > >> > >> > >> On Tue, Oct 15, 2013 at 12:33 PM, Jeff Law <l...@redhat.com> wrote: > >> > On 10/14/13 17:31, Cong Hou wrote: > >> >> > >> >> Any comment on this patch? > >> > > >> > Richi replied in the BZ you opened. > >> > > >> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58508 > >> > > >> > Essentially he said emit the load on the edge rather than in the block > >> > itself. > >> > jeff > >> > > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer