Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion

Richard Biener Thu, 28 May 2015 04:57:09 -0700

On Fri, May 22, 2015 at 5:42 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
> This example which I wrote to test ifconversion, currently fails to
> if-convert or vectorize:
>
> int foo ()
> {
>   for (int i = 0; i < 32 ; i++)
>     {
>       int m = (a[i] & i) ? 5 : 4;
>       b[i] = a[i] * m;
>     }
> }
>
> ...because jump-threading in dom1 rearranged the loop into a form that
> neither if-conversion nor vectorization would attempt. Discussion at
> https://gcc.gnu.org/ml/gcc/2015-04/msg00343.html lead to the suggestion that
> I should rerun loop-header copying (an earlier attempt to fix ifconversion,
> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01743.html, still did not
> enable vectorization.)
>
> This patch does so (and makes slightly less conservative, to tackle the
> example above). I found I had to make this a separate pass, so that the phi
> nodes were cleaned up at the end of the pass before running
> tree_if_conversion. Also at this stage in the compiler (inside loop opts) it
> was not possible to run loop_optimizer_init+finalize, or other
> loop_optimizer data structures needed later would be deleted; hence, I have
> two nearly-but-not-quite-identical passes, the new "ch_vect" avoiding the
> init/finalize. I tried to tackle this with some C++ subclassing, which
> removes the duplication, but the result feels a little ugly; suggestions for
> any neater approach welcome.
>
> This patch causes failure of the scan-tree-dump of dom2 in
> gcc.dg/ssa/pr21417.c. This looks for jump-threading to perform an
> optimization, but no longer finds the expected line in the log - as the
> loop-header-copying phase has already done an equivalent transformation
> *before* dom2. The final CFG is thus in the desired form, but I'm not sure
> how to determine this (scanning the CFG itself is very difficult, well
> beyond what we can do with regex, requiring looking at multiple lines and
> basic blocks). Can anyone advise? [The test issue can be worked around by
> preserving the old do_while_p logic for the first header-copying pass, and
> using the new logic only for the second, but this is more awkward inside the
> compiler, which feels wrong.]
>
> Besides the new vect-ifcvt-11.c, the testsuite actually has a couple of
> other examples where this patch enables (undesired!) vectorization. I've
> dealt with these, but for the record:
>         * gcc.dg/vect/slp-perm-7.c: the initialization loop in main,
> contained a check that input[i] < 200; this was already optimized out
> (because input[i] was set to i%256, where i<N with N #defined to 16), but
> that loop was not vectorized because:
> /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3:
> note: not vectorized: latch block not empty.
> /work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3:
> note: bad loop form.
>
>         * gcc.dg/vect/vect-strided-a-u16-i4.c: the main1() function has
> three loops; the first (initialization) has an 'if (y) abort() /* Avoid
> vectorization.  */'. However, the 'volatile int y = 0' this was meant to
> reference, is actually shadowed by a local non-volatile; the test is thus
> peeled off and absent from the body of the loop. The loop only avoided
> vectorization because of non-empty latch and bad loop form, as previous.
>
> With this patch, both those loops now have good form, hence I have fixed
> both to check a global volatile to prevent these extraneous parts from being
> vectorized.
>
> Tested with bootstrap + check-gcc on x86_64 and AArch64 (linux). As noted
> above, this causes a spurious PASS->FAIL of a scan-tree-dump test, which I'm
> unsure how to fix, but no other regressions.


Apart from Jeffs comment - the usual fix for the undesired
vectorization is to put
a __asm__ volatile (""); in the loop.

+  /* If any block in the loop has an exit edge, and code after it, it is
+     not a do-while loop.  */
+  basic_block *body = get_loop_body (loop);
+  for (unsigned i = 0; i < loop->num_nodes; i++)

wouldn't it be easier to verify that the predecessor of the loop latch
contains the (only) loop exit?

Like

   e = single_exit (loop);
   if (!e)
     return true;

   if (single_exit (loop)->pred != single_pred (loop->latch))
     return false;

?  In fact I think that even for multiple exists we want the latch predecessor
have an exit (though the vectorizer or if-conversion don't deal with that).

Note that single_exit () only works when the loop state has
LOOPS_HAVE_RECORDED_EXITS
thus it might be easier to simply check

  FOR_EACH_EDGE (... single_pred (loop->latch)->succs ..)
     if (e->dest == loop->latch)
       ;
     else
       break;
  if (!e || !loop_exit_edge_p (loop, e))
    return true;

which should work always.

Coding-style wise, can you please move the "common" pass_ch_vect::execute out
of the pass_ch_vect class?

  unsigned int res = pass_ch_vect::execute (fun);

looks ugly, as well as deriving pass_ch from pass_ch_vect.  I think pass_ch_vect
should be only executed if flag_tree_loop_vectorize is enabled.

   loop_optimizer_init (LOOPS_NORMAL
-                      | LOOPS_HAVE_RECORDED_EXITS);
+                      | LOOPS_HAVE_RECORDED_EXITS
+                      | LOOPS_HAVE_PREHEADERS
+                      | LOOPS_HAVE_SIMPLE_LATCHES);

already included in LOOPS_NORMAL.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * tree-pass.h (make_pass_ch_vect): New.
>         * passes.def: Add pass_ch_vect just before pass_if_conversion.
>
>         * tree-ssa-loop-ch.c (do_while_loop_p): For single-exit loops,
>         look for blocks with exit edge and code after them.
>         (pass_data_ch_vect, class pass_ch_vect, make_pass_ch_vect): New.
>         (class pass_ch): Extend pass_ch_vect.
>         (pass_ch::execute): Move all but loop_optimizer_init/finalize to...
>         (pass_ch_vect::execute): ...here.
>
>         * tree-ssa-loop.c (pass_tree_loop_init::execute): Add flags
>         LOOPS_HAVE_PREHEADERS and LOOPS_HAVE_SIMPLE_LATCHES.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/vect/slp-perm-7.c (zero): New.
>         (main): Test zero rather than input[i], to avoid vectorization.
>         * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of
> x,y,z,w.
>         of unsigned
>         * gcc.dg/vect/vect-ifcvt-11.c: New.
>

Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion

Reply via email to