On Fri, Nov 03, 2017 at 02:14:39PM +0100, Richard Biener wrote: > > +/* Return true if stmts in between FIRST (inclusive) and LAST (exclusive) > > + may clobber REF. FIRST and LAST must be in the same basic block and > > + have non-NULL vdef. */ > > + > > +bool > > +stmts_may_clobber_ref_p (gimple *first, gimple *last, tree ref) > > +{ > > + ao_ref r; > > + ao_ref_init (&r, ref); > > + unsigned int count = 0; > > + tree vop = gimple_vdef (last); > > + gimple *stmt; > > + > > + gcc_checking_assert (gimple_bb (first) == gimple_bb (last)); > > EBB would probably work as well, thus we should assert we do not > end up visiting a PHI in the loop?
For a general purpose routine sure, this one is in anonymous namespace and meant for use in this pass. And there it is only checking stores from the same store group and any other stores intermixed between those. The pass at least right now is resetting all of its state at the end of basic blocks, so gimple_bb (first) == gimple_bb (last) is indeed always guaranteed. If we ever extend it such that we don't have this guarantee, then this assert would fail and then of course it should be adjusted to handle whatever is needed. But do we need to do that right now? Note extending store-merging to handle groups of stores within EBB doesn't look useful, then not all stores would be unconditional. What could make sense is if we have e.g. a diamond | bb1 / \ bb2 bb3 \ / bb4 | and stores are in bb1 and bb4 and no stores in bb2 or bb3 can alias with those. But then we'd likely need full-blown walk_aliased_vdefs for this... > > + gimple *first = merged_store->first_stmt; > > + gimple *last = merged_store->last_stmt; > > + unsigned int i; > > + store_immediate_info *infoc; > > + if (info->order < merged_store->first_order) > > + { > > + FOR_EACH_VEC_ELT (merged_store->stores, i, infoc) > > + if (stmts_may_clobber_ref_p (info->stmt, first, infoc->ops[idx].val)) > > + return false; > > + first = info->stmt; > > + } > > + else if (info->order > merged_store->last_order) > > + { > > + FOR_EACH_VEC_ELT (merged_store->stores, i, infoc) > > + if (stmts_may_clobber_ref_p (last, info->stmt, infoc->ops[idx].val)) > > + return false; > > + last = info->stmt; > > + } > > + if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val)) > > + return false; > > Can you comment on what you check in this block? It first checks > all stmts (but not info->stmt itself if it is after last!?) > against > all stores that would be added when adding 'info'. Then it checks > from new first to last against the newly added stmt (again > excluding that stmt if it was added last). The stmts_may_clobber_ref_p routine doesn't check aliasing on the last stmt, only on the first stmt and stmts in between. Previous iterations have checked FOR_EACH_VEC_ELT (merged_store->stores, i, infoc) that merged_store->first_stmt and stmts in between that and merged_store->last_stmt don't clobber any of the infoc->ops[idx].val references and we want to maintain that invariant if we add another store to the group. So, if we are about to extend the range between first_stmt and last_stmt, then we need to check all the earlier refs on the stmts we've added to the range. Note that the stores are sorted by bitpos, not by their order within the basic block at this point, so it is possible that a store with a higher bitpos extends to earlier stmts or later stmts. And finally the if (stmts_may_clobber_ref_p (first, last, info->ops[idx].val)) is checking the reference we are adding against the whole new range. > > + if (offset != NULL_TREE) > > + { > > + /* If the access is variable offset then a base decl has to be > > + address-taken to be able to emit pointer-based stores to it. > > + ??? We might be able to get away with re-using the original > > + base up to the first variable part and then wrapping that inside > > + a BIT_FIELD_REF. */ > > Yes, that's what I'd generally recommend... OTOH it can get quite > fugly but it only has to survive until RTL expansion... This is an preexisting comment I've just moved around from the pass_store_merging::execute method into a helper function (it grew too much and needed too big indentation and furthermore I wanted to use it for the loads too). Haven't really changed anything on that. > As extra sanity check I'd rather have that all refs share a common > base (operand-equal-p'ish). But I guess that's what usually will > happen anyways. The alias-ptr-type trick will be tricky to do > here as well (you have to go down to the base MEM_REF, wrap > a decl if there's no MEM_REF and adjust the offset type). For the aliasing, I have an untested incremental patch, need to finish testcases for that, then test and then I can post it. > given the style of processing we can end up doing more than > necessary work when following ! single-use chains here, no? > Would it make sense to restrict this to single-uses or do we > handle any case of ! single-uses? When extending things to > allow an intermediate swap or general permute we could handle > a byte/word-splat for example. single-use vs. multiple uses is something I've thought about, but don't know whether it is better to require single-use or not (or sometimes, under some condition?). Say if we have: _1 = t.a; _2 = t.b; _3 = t.c; _4 = t.d; s.a = _1; s.b = _2; s.c = _3; s.d = _4; use (_1, _2, _3, _4); it will likely be shorter and maybe faster if we do: _1 = t.a; _2 = t.b; _3 = t.c; _4 = t.d; _5 = load[t.a...t.d]; store[s.a...s.d] = _5; use (_1, _2, _3, _4); If there are just 2 stores combined together, having _1 = t.a; _2 = t.b; _5 = load[t.a...t.b]; store[t.a...t.b] = _5; use (_1, _2); will be as many stmts as before. And if there is BIT_*_EXPR, we can be adding not just loads, but also the bitwise ops. So, if you want, I can at least for now add has_single_use checks in all the spots where it follows SSA_NAME_DEF_STMT. > Otherwise the patch looks good. Quite some parts of the changes > seem to be due to splitting out stuff into functions and refactoring. Indeed, the mem_valid_for_store_merging and pass_store_merging::process_store functions are mostly code move + reindent + tweaks. Here are hand made diffs between old portions of pass_store_merging::execute corresponding to the above 2 functions and those new functions with -upbd if it is of any help. @@ -1,33 +1,36 @@ - HOST_WIDE_INT bitsize, bitpos; +static tree +mem_valid_for_store_merging (tree mem, unsigned HOST_WIDE_INT *pbitsize, + unsigned HOST_WIDE_INT *pbitpos, + unsigned HOST_WIDE_INT *pbitregion_start, + unsigned HOST_WIDE_INT *pbitregion_end) +{ + HOST_WIDE_INT bitsize; + HOST_WIDE_INT bitpos; unsigned HOST_WIDE_INT bitregion_start = 0; unsigned HOST_WIDE_INT bitregion_end = 0; machine_mode mode; int unsignedp = 0, reversep = 0, volatilep = 0; - tree offset, base_addr; - base_addr - = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode, + tree offset; + tree base_addr = get_inner_reference (mem, &bitsize, &bitpos, &offset, &mode, &unsignedp, &reversep, &volatilep); - if (TREE_CODE (lhs) == COMPONENT_REF - && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1))) + *pbitsize = bitsize; + if (bitsize == 0) + return NULL_TREE; + + if (TREE_CODE (mem) == COMPONENT_REF + && DECL_BIT_FIELD_TYPE (TREE_OPERAND (mem, 1))) { - get_bit_range (&bitregion_start, &bitregion_end, lhs, - &bitpos, &offset); + get_bit_range (&bitregion_start, &bitregion_end, mem, &bitpos, &offset); if (bitregion_end) ++bitregion_end; } - if (bitsize == 0) - continue; - /* As a future enhancement we could handle stores with the same - base and offset. */ - bool invalid = reversep - || ((bitsize > MAX_BITSIZE_MODE_ANY_INT) - && (TREE_CODE (rhs) != INTEGER_CST)) - || !rhs_valid_for_store_merging_p (rhs); + if (reversep) + return NULL_TREE; /* We do not want to rewrite TARGET_MEM_REFs. */ if (TREE_CODE (base_addr) == TARGET_MEM_REF) - invalid = true; + return NULL_TREE; /* In some cases get_inner_reference may return a MEM_REF [ptr + byteoffset]. For the purposes of this pass canonicalize the base_addr to MEM_REF [ptr] and take @@ -60,7 +63,7 @@ } } else - invalid = true; + return NULL_TREE; base_addr = TREE_OPERAND (base_addr, 0); } /* get_inner_reference returns the base object, get at its @@ -68,7 +71,7 @@ else { if (bitpos < 0) - invalid = true; + return NULL_TREE; base_addr = build_fold_addr_expr (base_addr); } @@ -78,23 +81,25 @@ bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT); } - if (! invalid - && offset != NULL_TREE) + if (offset != NULL_TREE) { - /* If the access is variable offset then a base - decl has to be address-taken to be able to - emit pointer-based stores to it. - ??? We might be able to get away with - re-using the original base up to the first - variable part and then wrapping that inside + /* If the access is variable offset then a base decl has to be + address-taken to be able to emit pointer-based stores to it. + ??? We might be able to get away with re-using the original + base up to the first variable part and then wrapping that inside a BIT_FIELD_REF. */ tree base = get_base_address (base_addr); if (! base - || (DECL_P (base) - && ! TREE_ADDRESSABLE (base))) - invalid = true; - else - base_addr = build2 (POINTER_PLUS_EXPR, - TREE_TYPE (base_addr), + || (DECL_P (base) && ! TREE_ADDRESSABLE (base))) + return NULL_TREE; + + base_addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (base_addr), base_addr, offset); } + + *pbitsize = bitsize; + *pbitpos = bitpos; + *pbitregion_start = bitregion_start; + *pbitregion_end = bitregion_end; + return base_addr; +} ------ @@ -1,71 +1,129 @@ +void +pass_store_merging::process_store (gimple *stmt) +{ tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); + unsigned HOST_WIDE_INT bitsize, bitpos; + unsigned HOST_WIDE_INT bitregion_start; + unsigned HOST_WIDE_INT bitregion_end; + tree base_addr + = mem_valid_for_store_merging (lhs, &bitsize, &bitpos, + &bitregion_start, &bitregion_end); + if (bitsize == 0) + return; - /* As a future enhancement we could handle stores with the same - base and offset. */ - bool invalid = reversep + bool invalid = (base_addr == NULL_TREE || ((bitsize > MAX_BITSIZE_MODE_ANY_INT) - && (TREE_CODE (rhs) != INTEGER_CST)) - || !rhs_valid_for_store_merging_p (rhs); + && (TREE_CODE (rhs) != INTEGER_CST))); + enum tree_code rhs_code = ERROR_MARK; + store_operand_info ops[2]; + if (invalid) + ; + else if (rhs_valid_for_store_merging_p (rhs)) + { + rhs_code = INTEGER_CST; + ops[0].val = rhs; + } + else if (TREE_CODE (rhs) != SSA_NAME) + invalid = true; + else + { + gimple *def_stmt = SSA_NAME_DEF_STMT (rhs), *def_stmt1, *def_stmt2; + if (!is_gimple_assign (def_stmt)) + invalid = true; + else if (handled_load (def_stmt, &ops[0], bitsize, bitpos, + bitregion_start, bitregion_end)) + rhs_code = MEM_REF; + else + switch ((rhs_code = gimple_assign_rhs_code (def_stmt))) + { + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + tree rhs1, rhs2; + rhs1 = gimple_assign_rhs1 (def_stmt); + rhs2 = gimple_assign_rhs2 (def_stmt); + invalid = true; + if (TREE_CODE (rhs1) != SSA_NAME) + break; + def_stmt1 = SSA_NAME_DEF_STMT (rhs1); + if (!is_gimple_assign (def_stmt1) + || !handled_load (def_stmt1, &ops[0], bitsize, bitpos, + bitregion_start, bitregion_end)) + break; + if (rhs_valid_for_store_merging_p (rhs2)) + ops[1].val = rhs2; + else if (TREE_CODE (rhs2) != SSA_NAME) + break; + else + { + def_stmt2 = SSA_NAME_DEF_STMT (rhs2); + if (!is_gimple_assign (def_stmt2)) + break; + else if (!handled_load (def_stmt2, &ops[1], bitsize, bitpos, + bitregion_start, bitregion_end)) + break; + } + invalid = false; + break; + default: + invalid = true; + break; + } + } - struct imm_store_chain_info **chain_info - = m_stores.get (base_addr); + struct imm_store_chain_info **chain_info = NULL; + if (base_addr) + chain_info = m_stores.get (base_addr); - if (!invalid) + if (invalid) { + terminate_all_aliasing_chains (chain_info, stmt); + return; + } + store_immediate_info *info; if (chain_info) { unsigned int ord = (*chain_info)->m_store_info.length (); - info = new store_immediate_info (bitsize, bitpos, - bitregion_start, - bitregion_end, - stmt, ord); + info = new store_immediate_info (bitsize, bitpos, bitregion_start, + bitregion_end, stmt, ord, rhs_code, + ops[0], ops[1]); if (dump_file && (dump_flags & TDF_DETAILS)) { - fprintf (dump_file, - "Recording immediate store from stmt:\n"); + fprintf (dump_file, "Recording immediate store from stmt:\n"); print_gimple_stmt (dump_file, stmt, 0); } (*chain_info)->m_store_info.safe_push (info); - /* If we reach the limit of stores to merge in a chain - terminate and process the chain now. */ + /* If we reach the limit of stores to merge in a chain terminate and + process the chain now. */ if ((*chain_info)->m_store_info.length () - == (unsigned int) - PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE)) + == (unsigned int) PARAM_VALUE (PARAM_MAX_STORES_TO_MERGE)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, - "Reached maximum number of statements" - " to merge:\n"); + "Reached maximum number of statements to merge:\n"); terminate_and_release_chain (*chain_info); } - continue; + return; } /* Store aliases any existing chain? */ - terminate_all_aliasing_chains (chain_info, false, stmt); + terminate_all_aliasing_chains (chain_info, stmt); /* Start a new chain. */ struct imm_store_chain_info *new_chain = new imm_store_chain_info (m_stores_head, base_addr); - info = new store_immediate_info (bitsize, bitpos, - bitregion_start, - bitregion_end, - stmt, 0); + info = new store_immediate_info (bitsize, bitpos, bitregion_start, + bitregion_end, stmt, 0, rhs_code, + ops[0], ops[1]); new_chain->m_store_info.safe_push (info); m_stores.put (base_addr, new_chain); if (dump_file && (dump_flags & TDF_DETAILS)) { - fprintf (dump_file, - "Starting new chain with statement:\n"); + fprintf (dump_file, "Starting new chain with statement:\n"); print_gimple_stmt (dump_file, stmt, 0); fprintf (dump_file, "The base object is:\n"); print_generic_expr (dump_file, base_addr); fprintf (dump_file, "\n"); } - } - else - terminate_all_aliasing_chains (chain_info, - offset != NULL_TREE, stmt); - - continue; +} Jakub