Hi Jakub,

On 02/11/17 14:10, Jakub Jelinek wrote:
Hi!

The following patch improves store merging, so that it doesn't handle
just constant stores into adjacent memory, but also adjacent memory copying
and simple bitwise logical ops where at least one argument is a load
from adjacent memory and the other argument as well or a constant.
The loads are limited to be either all using the same vuse, or each using
vuse of the corresponding stores.  So examples of what can be handled are:
   s.a = 1; s.b = 2; // we could handle this before this patch already
   _1 = t.a; _2 = t.b; s.a = _1; s.b = _2; // copying with the same vuse
   _1 = t.a; s.a = _1; _2 = t.b; s.b = _2; // copying with vuse of the store
   _1 = s.a; _2 = _1 | 23; _3 = s.b; _4 = _3 | 12345; s.a = _2; s.b = _4; // | 
with one load and one constant
etc.
What the patch doesn't handle yet because
terminate_all_aliasing_chains uses:
       /* We can't use the base object here as that does not reliably exist.
          Build a ao_ref from the base object address (if we know the
          minimum and maximum offset and the maximum size we could improve
          things here).  */
       ao_ref chain_ref;
       ao_ref_init_from_ptr_and_size (&chain_ref, cur->base_addr, NULL_TREE);
is e.g.
void
f3 (struct S *__restrict p, struct S *__restrict q)
{
   p->a |= q->a; p->b |= q->b; p->c |= q->c; p->d |= q->d;
   p->e |= q->e; p->f |= q->f; p->g |= q->g; p->h |= q->h;
}
I'll try to improve that incrementally by preserving the underlying original
reference and tracking minimum/maximum offsets from that.

The patch also doesn't hook in the bswap infrastructure to recognize say
struct S { char a, b, c, d; } u, v;
void foo (void) { u.a = v.d; u.b = v.c; u.c = v.b; u.d = v.a; }
though wonder if it is worth it (whether there is any real-world code like
that at all or common enough to worth the work on it).

Bootstrapped/regtested on {x86_64,i686,powerpc64,powerpc64le}-linux, ok for
trunk?

I'm now doing another x86_64/i686 bootstrap/regtest to gather some
statistics, both are still regtesting now, the current numbers show:
rhs_code        split_stores.length ()          orig_num_stmts
integer_cst     135533                          275746
mem_ref         13289                           27852
bit_*_expr      36                              81
so the first row shows that already before this patch when we decided to
optimize constant stores we decreased the number to 50% on average, for
memory copying around 10% cases of the constant stores and the reason
why the bitwise logical don't trigger much is probably related to the
above mentioned ao_ref_init* missed-opt as well as such constructs being
far less common.  In theory we could handle also mixed rhs codes, but not
sure it is worth the effort - e.g. if somebody does:
   s.a = 5; s.b |= 4; s.c &= 2; s.d ^= 5;
we could load the memory and do some |/&/^ on it.

this looks great! I have a couple of comments.
* Can you please extend file comments for gimple-ssa-store-merging.c ?
Currently it mostly describes how we merge constants together. Once we start accepting non-constant members
we should mention it in there.

* Can we also handle BIT_NOT_EXPRESSIONS? i.e. Copying memory locations but but with a unary op applied on top. Don't know how often that comes up though. Maybe it will complicate store_operand_info and its handling too much
to be worth it...
2017-11-02  Jakub Jelinek  <ja...@redhat.com>

        PR tree-optimization/78821
        * gimple-ssa-store-merging.c (struct store_operand_info): New type.
        (store_operand_info::store_operand_info): New constructor.
        (struct store_immediate_info): Add rhs_code and ops data members.
        (store_immediate_info::store_immediate_info): Add rhscode, op0r
        and op1r arguments to the ctor, initialize corresponding data members.
        (struct merged_store_group): Add load_align_base and load_align
        data members.
        (merged_store_group::merged_store_group): Initialize them.
        (merged_store_group::do_merge): Update them.
        (merged_store_group::apply_stores): Pick the constant for
        encode_tree_to_bitpos from one of the two operands, or skip
        encode_tree_to_bitpos if neither operand is a constant.
        (class pass_store_merging): Add process_store method decl.  Remove
        bool argument from terminate_all_aliasing_chains method decl.
        (pass_store_merging::terminate_all_aliasing_chains): Remove
        var_offset_p argument and corresponding handling.
        (stmts_may_clobber_ref_p): New function.
        (compatible_load_p): New function.
        (imm_store_chain_info::coalesce_immediate_stores): Terminate group
        if there is overlap and rhs_code is not INTEGER_CST.  For
        non-overlapping stores terminate group if rhs is not mergeable.
        (get_alias_type_for_stmts): Change first argument from
        auto_vec<gimple *> & to vec<gimple *> &.  Add IS_LOAD, CLIQUEP and
        BASEP arguments.  If IS_LOAD is true, look at rhs1 of the stmts
        instead of lhs.  Compute *CLIQUEP and *BASEP in addition to the
        alias type.
        (get_location_for_stmts): Change first argument from
        auto_vec<gimple *> & to vec<gimple *> &.
        (struct split_store): Remove orig_stmts data member, add orig_stores.
        (split_store::split_store): Create orig_stores rather than orig_stmts.
        (find_constituent_stmts): Renamed to ...
        (find_constituent_stores): ... this.  Change second argument from
        vec<gimple *> * to vec<store_immediate_info *> *, push pointers
        to info structures rather than the statements.
        (split_group): Rename ALLOW_UNALIGNED argument to
        ALLOW_UNALIGNED_STORE, add ALLOW_UNALIGNED_LOAD argument and handle
        it.  Adjust find_constituent_stores caller.
        (imm_store_chain_info::output_merged_store): Handle rhs_code other
        than INTEGER_CST, adjust split_group, get_alias_type_for_stmts and
        get_location_for_stmts callers.  Set MR_DEPENDENCE_CLIQUE and
        MR_DEPENDENCE_BASE on the MEM_REFs if they are the same in all stores.
        (mem_valid_for_store_merging): New function.
        (handled_load): New function.
        (pass_store_merging::process_store): New method.
        (pass_store_merging::execute): Use process_store method.  Adjust
        terminate_all_aliasing_chains caller.

        * gcc.dg/store_merging_13.c: New test.
        * gcc.dg/store_merging_14.c: New test.


<snip>

@@ -920,6 +980,107 @@ pass_store_merging::terminate_and_releas
    return ret;
  }
+/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive)
+   may clobber REF.  FIRST and LAST must be in the same basic block and
+   have non-NULL vdef.  */
+
+bool
+stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref)
+{
+  ao_ref r;
+  ao_ref_init (&r, ref);
+  unsigned int count = 0;
+  tree vop = gimple_vdef (last);
+  gimple *stmt;
+
+  gcc_checking_assert (gimple_bb (first) == gimple_bb (last));
+  do
+    {
+      stmt = SSA_NAME_DEF_STMT (vop);
+      if (stmt_may_clobber_ref_p_1 (stmt, &r))
+       return true;
+      if (++count > 64)
+       return true;

Magic number 64? Don't know if it's worth having a PARAM for it, but at least a comment saying we're bailing out
for compile time considerations would be good.

+      vop = gimple_vuse (stmt);
+    }
+  while (stmt != first);
+  return false;
+}
+

<snip>

+         tree ops[2];
+         for (int j = 0;
+              j < 1 + (split_store->orig_stores[0]->ops[1].val != NULL_TREE);
+              ++j)
+           {
+             store_operand_info &op = split_store->orig_stores[0]->ops[j];
+             if (op.base_addr)
+               {
+                 FOR_EACH_VEC_ELT (split_store->orig_stores, k, info)
+                   orig_stmts.safe_push (info->ops[j].stmt);
+
+                 offset_type = get_alias_type_for_stmts (orig_stmts, true,
+                                                         &clique, &base);
+                 location_t load_loc = get_location_for_stmts (orig_stmts);
+                 orig_stmts.truncate (0);
+
+                 unsigned HOST_WIDE_INT load_align = group->load_align[j];
+                 unsigned HOST_WIDE_INT align_bitpos
+                   = (try_pos * BITS_PER_UNIT
+                      - split_store->orig_stores[0]->bitpos
+                      + op.bitpos) & (load_align - 1);
+                 if (align_bitpos)
+                   load_align = least_bit_hwi (align_bitpos);
+
+                 tree load_int_type
+                   = build_nonstandard_integer_type (try_size, UNSIGNED);
+                 load_int_type
+                   = build_aligned_type (load_int_type, load_align);
+
+                 unsigned HOST_WIDE_INT load_pos
+                   = (try_pos * BITS_PER_UNIT
+                      - split_store->orig_stores[0]->bitpos
+                      + op.bitpos) / BITS_PER_UNIT;
+                 ops[j] = fold_build2 (MEM_REF, load_int_type, load_addr[j],
+                                       build_int_cst (offset_type, load_pos));
+                 if (TREE_CODE (ops[j]) == MEM_REF)
+                   {
+                     MR_DEPENDENCE_CLIQUE (ops[j]) = clique;
+                     MR_DEPENDENCE_BASE (ops[j]) = base;
+                   }
+                 if (!integer_zerop (mask))
+                   TREE_NO_WARNING (ops[j]) = 1;

Please add a comment here as to why we need the TREE_NO_WARNING here.

Thanks,
Kyrill

Reply via email to