On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> For the moment, tree-predcom.c only supports invariant/load-loads/store-loads 
> chains.
> This patch generalizes dead store elimination (or store motion) across loop 
> iterations in
> predictive commoning pass by supporting store-store chain.  As comment in the 
> patch:
>
>    Apart from predictive commoning on Load-Load and Store-Load chains, we
>    also support Store-Store chains -- stores killed by other store can be
>    eliminated.  Given below example:
>
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>          a[i+2] = 2;
>        }
>
>    It can be replaced with:
>
>      t0 = a[0];
>      t1 = a[1];
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>          t2 = 2;
>          t0 = t1;
>          t1 = t2;
>        }
>      a[n] = t0;
>      a[n+1] = t1;
>
>    If the loop runs more than 1 iterations, it can be further simplified into:
>
>      for (i = 0; i < n; i++)
>        {
>          a[i] = 1;
>        }
>      a[n] = 2;
>      a[n+1] = 2;
>
>    The interesting part is this can be viewed either as general store motion
>    or general dead store elimination in either intra/inter-iterations way.
>
> There are number of interesting facts about this enhancement:
> a) This patch supports dead store elimination for both across-iteration case 
> and single-iteration
>      case.  For the latter, it is dead store elimination.
> b) There are advantages supporting dead store elimination in predcom, for 
> example, it has
>      complete information about memory address.  On the contrary, DSE pass 
> can only handle
>      memory references with exact the same memory address expression.
> c) It's cheap to support store-stores chain in predcom based on existing code.
> d) As commented, the enhancement can be viewed as either generalized dead 
> store elimination
>      or generalized store motion.  I prefer DSE here.
>
> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?

Looks mostly ok.  I have a few questions though.

+  /* Don't do store elimination if loop has multiple exit edges.  */
+  bool eliminate_store_p = single_exit (loop) != NULL;

handling this would be an enhancement?  IIRC LIM store-motion handles this
just fine by emitting code on all exits.

@@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains)
     {
       if (chain->type == CT_INVARIANT)
        continue;
+      /* Don't unroll when eliminating stores.  */
+      else if (chain->type == CT_STORE_STORE)
+       return 1;

this is a hard exit value so we do not handle the case where another chain
in the loop would want to unroll? (enhancement?)  I'd have expected to do
the same as for CT_INVARIANT here.

+      tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
+      if (!chain->all_always_accessed && tree_could_trap_p (init))
+       {
+         gimple_seq_discard (stmts);
+         return false;

so this is the only place that remotely cares for not always performed stores.
But as-is the patch doesn't seem to avoid speculating stores and thus
violates the C++ memory model, aka, introduces store-data-races?  The LIM
store-motion code was fixed to avoid this by keeping track of whether a BB
has executed to guard the stores done in the compensation code on the loop
exit.

That said, to "fix" this all && tree_could_trap_p cases would need to be removed
(or similarly flag vars be introduced).  Speculating loads that do not
trap is ok
(might only introduce false uninit use reports by tools like valgrind).

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-21  Bin Cheng  <bin.ch...@arm.com>
>
>         * tree-predcom.c: Revise general description of pass.
>         (enum chain_type): New enum type for store elimination.
>         (struct chain): New field supporting store elimination.
>         (dump_chain): Dump store-stores chain.
>         (release_chain): Release resources.
>         (split_data_refs_to_components): Compute and create component
>         contains only stores for elimination.
>         (get_chain_last_ref_at): New function.
>         (make_invariant_chain): Initialization.
>         (make_rooted_chain): Specify chain type in parameter.
>         (add_looparound_copies): Skip for store-stores chain.
>         (determine_roots_comp): Compute type of chain and pass it to
>         make_rooted_chain.
>         (initialize_root_vars_store_elim_2): New function.
>         (finalize_eliminated_stores): New function.
>         (remove_stmt): Handle store for elimination.
>         (execute_pred_commoning_chain): Execute predictive commoning on
>         store-store chains.
>         (determine_unroll_factor): Skip unroll for store-stores chain.
>         (prepare_initializers_chain_store_elim): New function.
>         (prepare_initializers_chain): Hanlde store-store chain.
>         (prepare_finalizers_chain, prepare_finalizers): New function.
>         (tree_predictive_commoning_loop): Return integer value indicating
>         if loop is unrolled or lcssa form is corrupted.
>         (tree_predictive_commoning): Rewrite for lcssa form if necessary.
>
> gcc/testsuite/ChangeLog
> 2017-06-21  Bin Cheng  <bin.ch...@arm.com>
>
>         * gcc.dg/tree-ssa/predcom-dse-1.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-2.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-3.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-4.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-5.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-6.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-7.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-8.c: New test.
>         * gcc.dg/tree-ssa/predcom-dse-9.c: New test.

Reply via email to