On Fri, May 22, 2015 at 5:42 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: > This example which I wrote to test ifconversion, currently fails to > if-convert or vectorize: > > int foo () > { > for (int i = 0; i < 32 ; i++) > { > int m = (a[i] & i) ? 5 : 4; > b[i] = a[i] * m; > } > } > > ...because jump-threading in dom1 rearranged the loop into a form that > neither if-conversion nor vectorization would attempt. Discussion at > https://gcc.gnu.org/ml/gcc/2015-04/msg00343.html lead to the suggestion that > I should rerun loop-header copying (an earlier attempt to fix ifconversion, > https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01743.html, still did not > enable vectorization.) > > This patch does so (and makes slightly less conservative, to tackle the > example above). I found I had to make this a separate pass, so that the phi > nodes were cleaned up at the end of the pass before running > tree_if_conversion. Also at this stage in the compiler (inside loop opts) it > was not possible to run loop_optimizer_init+finalize, or other > loop_optimizer data structures needed later would be deleted; hence, I have > two nearly-but-not-quite-identical passes, the new "ch_vect" avoiding the > init/finalize. I tried to tackle this with some C++ subclassing, which > removes the duplication, but the result feels a little ugly; suggestions for > any neater approach welcome. > > This patch causes failure of the scan-tree-dump of dom2 in > gcc.dg/ssa/pr21417.c. This looks for jump-threading to perform an > optimization, but no longer finds the expected line in the log - as the > loop-header-copying phase has already done an equivalent transformation > *before* dom2. The final CFG is thus in the desired form, but I'm not sure > how to determine this (scanning the CFG itself is very difficult, well > beyond what we can do with regex, requiring looking at multiple lines and > basic blocks). Can anyone advise? [The test issue can be worked around by > preserving the old do_while_p logic for the first header-copying pass, and > using the new logic only for the second, but this is more awkward inside the > compiler, which feels wrong.] > > Besides the new vect-ifcvt-11.c, the testsuite actually has a couple of > other examples where this patch enables (undesired!) vectorization. I've > dealt with these, but for the record: > * gcc.dg/vect/slp-perm-7.c: the initialization loop in main, > contained a check that input[i] < 200; this was already optimized out > (because input[i] was set to i%256, where i<N with N #defined to 16), but > that loop was not vectorized because: > /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: > note: not vectorized: latch block not empty. > /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: > note: bad loop form. > > * gcc.dg/vect/vect-strided-a-u16-i4.c: the main1() function has > three loops; the first (initialization) has an 'if (y) abort() /* Avoid > vectorization. */'. However, the 'volatile int y = 0' this was meant to > reference, is actually shadowed by a local non-volatile; the test is thus > peeled off and absent from the body of the loop. The loop only avoided > vectorization because of non-empty latch and bad loop form, as previous. > > With this patch, both those loops now have good form, hence I have fixed > both to check a global volatile to prevent these extraneous parts from being > vectorized. > > Tested with bootstrap + check-gcc on x86_64 and AArch64 (linux). As noted > above, this causes a spurious PASS->FAIL of a scan-tree-dump test, which I'm > unsure how to fix, but no other regressions.
Apart from Jeffs comment - the usual fix for the undesired vectorization is to put a __asm__ volatile (""); in the loop. + /* If any block in the loop has an exit edge, and code after it, it is + not a do-while loop. */ + basic_block *body = get_loop_body (loop); + for (unsigned i = 0; i < loop->num_nodes; i++) wouldn't it be easier to verify that the predecessor of the loop latch contains the (only) loop exit? Like e = single_exit (loop); if (!e) return true; if (single_exit (loop)->pred != single_pred (loop->latch)) return false; ? In fact I think that even for multiple exists we want the latch predecessor have an exit (though the vectorizer or if-conversion don't deal with that). Note that single_exit () only works when the loop state has LOOPS_HAVE_RECORDED_EXITS thus it might be easier to simply check FOR_EACH_EDGE (... single_pred (loop->latch)->succs ..) if (e->dest == loop->latch) ; else break; if (!e || !loop_exit_edge_p (loop, e)) return true; which should work always. Coding-style wise, can you please move the "common" pass_ch_vect::execute out of the pass_ch_vect class? unsigned int res = pass_ch_vect::execute (fun); looks ugly, as well as deriving pass_ch from pass_ch_vect. I think pass_ch_vect should be only executed if flag_tree_loop_vectorize is enabled. loop_optimizer_init (LOOPS_NORMAL - | LOOPS_HAVE_RECORDED_EXITS); + | LOOPS_HAVE_RECORDED_EXITS + | LOOPS_HAVE_PREHEADERS + | LOOPS_HAVE_SIMPLE_LATCHES); already included in LOOPS_NORMAL. Thanks, Richard. > gcc/ChangeLog: > > * tree-pass.h (make_pass_ch_vect): New. > * passes.def: Add pass_ch_vect just before pass_if_conversion. > > * tree-ssa-loop-ch.c (do_while_loop_p): For single-exit loops, > look for blocks with exit edge and code after them. > (pass_data_ch_vect, class pass_ch_vect, make_pass_ch_vect): New. > (class pass_ch): Extend pass_ch_vect. > (pass_ch::execute): Move all but loop_optimizer_init/finalize to... > (pass_ch_vect::execute): ...here. > > * tree-ssa-loop.c (pass_tree_loop_init::execute): Add flags > LOOPS_HAVE_PREHEADERS and LOOPS_HAVE_SIMPLE_LATCHES. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/slp-perm-7.c (zero): New. > (main): Test zero rather than input[i], to avoid vectorization. > * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of > x,y,z,w. > of unsigned > * gcc.dg/vect/vect-ifcvt-11.c: New. >