The following patch adds FRE after vectorization which is needed
for IVOPTs to remove redundant PHI nodes (well, I'm testing a
patch for FRE that will do it already there).

The patch also makes FRE preserve loop-closed SSA form and thus
make it suitable for use in the loop pipeline.

With the placement in the vectorizer sub-pass FRE will effectively
be enabled by -O3 only (well, or if one requests loop vectorization).
I've considered placing it after complete_unroll instead but that
would enable it at -O1 already.  I have no strong opinion on the
exact placement, but it should help all passes between vectorizing
and ivopts for vectorized loops.

Yeah, it adds yet another pass and thus I don't like it very much.
But it for example improves code generated for
gfortran.dg/vect/fast-math-pr37021.f90
from

.L14:
        movupd  (%r11), %xmm3
        addl    $1, %ecx
        addq    %rax, %r11
        movupd  (%r8), %xmm0
        addq    %rax, %r8
        unpckhpd        %xmm3, %xmm3
        movupd  (%rdi), %xmm2
        unpcklpd        %xmm0, %xmm0
        addq    %rsi, %rdi
        movupd  (%rbx), %xmm1
        mulpd   %xmm3, %xmm2
        addq    %rsi, %rbx
        cmpl    %ecx, %ebp
        palignr $8, %xmm1, %xmm1
        mulpd   %xmm1, %xmm0
        movapd  %xmm2, %xmm1
        addpd   %xmm0, %xmm1
        subpd   %xmm2, %xmm0
        shufpd  $2, %xmm0, %xmm1
        addpd   %xmm1, %xmm4
        jne     .L14

to

.L14:
        movupd  (%r8), %xmm0
        addl    $1, %ecx
        addq    %rax, %r8
        movapd  %xmm0, %xmm2
        movupd  (%rdi), %xmm1
        addq    %rsi, %rdi
        cmpl    %ecx, %r11d
        unpckhpd        %xmm0, %xmm2
        unpcklpd        %xmm0, %xmm0
        mulpd   %xmm1, %xmm2
        palignr $8, %xmm1, %xmm1
        mulpd   %xmm1, %xmm0
        movapd  %xmm2, %xmm1
        addpd   %xmm0, %xmm1
        subpd   %xmm2, %xmm0
        shufpd  $2, %xmm0, %xmm1
        addpd   %xmm1, %xmm3
        jne     .L14

(yeah, the vectorizer happily generates redundant loads and one IV
for each such load)

Any other suggestions on pass placement?  I can of course key
that FRE run on -O3 explicitely.  Not sure if we at this point
want to start playing fancy games like setting a property
when a pass (likely) generated redundancies that are worth
fixing up and then key FRE on that one (it gets harder and
less predictable what transforms are run on code).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.  With
other placements I'd expect quite some testsuite fallout
eventually.

Thoughts?

Thanks,
Richard.

2015-06-10  Richard Biener  <rguent...@suse.de>

        * passes.def (pass_vectorize): Add pass_fre.
        * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
        Preserve loop-closed SSA form.

Index: gcc/passes.def
===================================================================
*** gcc/passes.def      (revision 224324)
--- gcc/passes.def      (working copy)
*************** along with GCC; see the file COPYING3.
*** 252,257 ****
--- 252,258 ----
             Please do not add any other passes in between.  */
          NEXT_PASS (pass_vectorize);
            PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
+             NEXT_PASS (pass_fre);
              NEXT_PASS (pass_dce);
            POP_INSERT_PASSES ()
            NEXT_PASS (pass_predcom);
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c  (revision 224324)
--- gcc/tree-ssa-pre.c  (working copy)
*************** eliminate_dom_walker::before_dom_childre
*** 4013,4018 ****
--- 4013,4028 ----
       tailmerging.  Eventually we can reduce its reliance on SCCVN now
       that we fully copy/constant-propagate (most) things.  */
  
+   /* Compute whether this block has loop-closed PHI nodes we need
+      to preserve.  */
+   bool lc_phi = false;
+   edge e;
+   if (loops_state_satisfies_p (LOOP_CLOSED_SSA)
+       && single_pred_p (b)
+       && (e = single_pred_edge (b))
+       && loop_exit_edge_p (e->src->loop_father, e))
+     lc_phi = true;
+ 
    for (gphi_iterator gsi = gsi_start_phis (b); !gsi_end_p (gsi);)
      {
        gphi *phi = gsi.phi ();
*************** eliminate_dom_walker::before_dom_childre
*** 4026,4032 ****
  
        tree sprime = eliminate_avail (res);
        if (sprime
!         && sprime != res)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            {
--- 4036,4043 ----
  
        tree sprime = eliminate_avail (res);
        if (sprime
!         && sprime != res
!         && !lc_phi)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            {
*************** eliminate_dom_walker::before_dom_childre
*** 4466,4472 ****
  
    /* Replace destination PHI arguments.  */
    edge_iterator ei;
-   edge e;
    FOR_EACH_EDGE (e, ei, b->succs)
      {
        for (gphi_iterator gsi = gsi_start_phis (e->dest);
--- 4477,4482 ----

Reply via email to