Re: [RFC] Implement load sinking in loops

Richard Guenther Mon, 08 Oct 2012 05:32:49 -0700

On Mon, Oct 8, 2012 at 12:38 PM, Eric Botcazou <ebotca...@adacore.com> wrote:
> Hi,
>
> we recently noticed that, even at -O3, the compiler doesn't figure out that
> the following loop is dumb:
>
> #define SIZE 64
>
> int foo (int v[])
> {
>   int r;
>
>   for (i = 0; i < SIZE; i++)
>     r = v[i];
>
>   return r;
> }
>
> which was a bit of a surprise.  On second thoughts, this isn't entirely
> unexpected, as it probably matters only for (slightly) pathological cases.
> The attached patch nevertheless implements a form of load sinking in loops so
> as to optimize these cases.  It's combined with invariant motion to optimize:
>
> int foo (int v[], int a)
> {
>   int r, i;
>
>   for (i = 0; i < SIZE; i++)
>     r = v[i] + a;
>
>   return r;
> }
>
> and with store sinking to optimize:
>
> int foo (int v1[], int v2[])
> {
>   int r[SIZE];
>   int i, j;
>
>   for (j = 0; j < SIZE; j++)
>     for (i = 0; i < SIZE; i++)
>       r[j] = v1[j] + v2[i];
>
>   return r[SIZE - 1];
> }
>
> The optimization is enabled at -O2 in the patch for measurement purposes but,
> given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap,
> compiler-only, all languages except Go), it's probably best suited to -O3.
> Or perhaps we don't care and it should simply be dropped...  Thoughts?


Incidentially we have scev-const-prop to deal with the similar case of
scalar computations.  But I realize this doesn't work for expressions that
are dependent on a loop variant load.

@@ -103,6 +103,7 @@ typedef struct mem_ref_loc
 {
   tree *ref;                   /* The reference itself.  */
   gimple stmt;                 /* The statement in that it occurs.  */
+  tree lhs;                    /* The (ultimate) LHS for a load.  */
 } *mem_ref_loc_p;

isn't that the lhs of stmt?

+static gimple_seq
+copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
+{
+  tree mem = *loc->ref;
+  tree lhs, tmp_var, ssa_name;
+  gimple_seq seq = NULL;
+  gimple stmt;
+  unsigned n = 0;
+
+  /* First copy the load and create the new LHS for it.  */
+  lhs = gimple_assign_lhs (loc->stmt);
+  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));

use make_temp_ssa_name or simply copy_ssa_name (not sure you need
fancy names here).

+      if (gimple_assign_rhs1 (use_stmt) == lhs)
+       {
+         op1 = ssa_name;
+         op2 = gimple_assign_rhs2 (use_stmt);
+       }
+      else
+       {
+         op1 = gimple_assign_rhs1 (use_stmt);
+         op2 = ssa_name;
+       }

this may enlarge lifetime of the other operand?  And it looks like it would
break with unary stmts (accessing out-of-bounds op2).  Also for
is_gimple_min_invariant other operand which may be for example &a.b
you need to unshare_expr it.

+      lhs = gimple_assign_lhs (use_stmt);
+      tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));
+      stmt = gimple_build_assign_with_ops (rhs_code, tmp_var, op1, op2);
+      ssa_name = make_ssa_name (tmp_var, stmt);
+      gimple_assign_set_lhs (stmt, ssa_name);

see above.  This can now be simplified to

   lhs = gimple_assign_lhs (use_stmt);
   ssa_name = copy_ssa_name (lhs, NULL);
   stmt = gimple_build_assign_with_ops (rhs_code, ssa_name, op1, op2);

Btw - isn't this all a bit backward (I mean the analysis in execute_lm?)
What you want is apply this transform to as much of the _DEF_s of
the loop-closed PHI nodes - only values used outside of the loop are
interesting.  Thats (sort-of) what SCEV const-prop does (well, it also
uses SCEV to compute the overall effect of the iterations).  So what
you want to know is whether when walking the DEF chain of the
loop closed PHI you end up at definitions before the loop or at
definitions that are not otherwise used inside the loop.

Which means it is really expression sinking.  Does tree-ssa-sink manage
to sink anything out of a loop?  Even scalar computation parts I mean?  For

 for (..)
   {
     a = x[i];
     y[i] = a;
     b = a * 2;
   }
  ... = b;

it should be able to sink b = a*2.

So I think the more natural place to implement this is either SCEV cprop
or tree-ssa-sink.c.  And process things from the loop-closed PHI use
walking the DEFs (first process all, marking interesting things to also
catch commonly used exprs for two PHI uses).

Again you might simply want to open a bugreport for this unless you
want to implement it yourself.

Thanks,
Richard.

> Tested on x86_64-suse-linux.
>
>
> 2012-10-08  Eric Botcazou  <ebotca...@adacore.com>
>
>         * gimple.h (gsi_insert_seq_on_edge_before): Declare.
>         * gimple-iterator.c (gsi_insert_seq_on_edge_before): New function.
>         * tree-ssa-loop-im.c (struct mem_ref_loc): Add LHS field.
>         (mem_ref_in_stmt): Remove gcc_assert.
>         (copy_load_and_single_use_chain): New function.
>         (execute_lm): Likewise.
>         (hoist_memory_references): Hoist the loads after the stores.
>         (ref_always_accessed_p): Rename into...
>         (ref_always_stored_p): ...this.  Remove STORE_P and add ONCE_P.
>         (can_lsm_ref_p): New function extracted from...
>         (can_sm_ref_p): ...here.  Call it.
>         (follow_invariant_single_use_chain): New function.
>         (can_lm_ref_p): Likewise.
>         (find_refs_for_sm): Rename into..
>         (find_refs_for_lsm): ...this.  Find load hoisting opportunities.
>         (loop_suitable_for_sm): Rename into...
>         (loop_suitable_for_lsm): ...this.
>         (store_motion_loop): Rename into...
>         (load_store_motion_loop): ...this.  Adjust calls to above functions.
>         (tree_ssa_lim): Likewise.
>
>
> 2012-10-08  Eric Botcazou  <ebotca...@adacore.com>
>
>         * gcc.dg/tree-ssa/loadmotion-1.c: New test.
>         * gcc.dg/tree-ssa/loadmotion-2.c: New test.
>         * gcc.dg/tree-ssa/loadmotion-3.c: New test.
>
>
> --
> Eric Botcazou

Re: [RFC] Implement load sinking in loops

Reply via email to