Hi Richard,
Thanks for the detailed comments, I'll be working on addressing them.
Below are answers to some of your questions:
On 29/09/16 11:45, Richard Biener wrote:
+
+ /* If we're inserting a non-bytesized width or not at a byte boundary
+ use an intermediate wide_int to perform the bit-insertion correctly. */
+ if (sub_byte_op_p
+ || (TREE_CODE (expr) == INTEGER_CST
+ && mode_for_size (bitlen, MODE_INT, 0) == BLKmode))
I wonder when we have BLKmode here ...
+ {
+ unsigned int byte_size = last_byte - first_byte + 1;
+
+ /* The functions native_encode_expr/native_interpret_expr uses the
+ TYPE_MODE of the type to determine the number of bytes to write/read
+ so if we want to process a number of bytes that does not have a
+ TYPE_MODE of equal size we need to use a type that has a valid mode
+ for it. */
+
+ machine_mode mode
+ = smallest_mode_for_size (byte_size * BITS_PER_UNIT, MODE_INT);
+ tree dest_int_type
+ = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), UNSIGNED);
+ byte_size = GET_MODE_SIZE (mode);
... how we ever get non-BLKmode here.
smallest_mode_for_size is guaranteed to never return BLKmode.
It returns a mode that contains at least the requested number of bits.
mode_for_size returns BLKmode if no mode fits the exact number of bits.
+ /* The region from the byte array that we're inserting into. */
+ tree ptr_wide_int
+ = native_interpret_expr (dest_int_type, ptr + first_byte,
+ total_bytes);
+
+ gcc_assert (ptr_wide_int);
+ wide_int dest_wide_int
+ = wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type));
+ wide_int expr_wide_int
+ = wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT);
+ if (BYTES_BIG_ENDIAN)
+ {
+ unsigned int insert_pos
+ = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT);
+ dest_wide_int
+ = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen);
+ }
+ else
+ dest_wide_int = wi::insert (dest_wide_int, expr_wide_int,
+ bitpos % BITS_PER_UNIT, bitlen);
+
+ tree res = wide_int_to_tree (dest_int_type, dest_wide_int);
+ native_encode_expr (res, ptr + first_byte, total_bytes, 0);
+
OTOH this whole dance looks as complicated and way more expensive than
using native_encode_expr into a temporary buffern and then a
manually implemented "bit-merging" of it at ptr + first_byte + bitpos.
AFAICS that operation is even endianess agnostic.
I did try implementing that, but it kept blowing up in big-endian.
I found it to be very fiddly. I can try again, but we'll see how it goes...
<snip>
+
+bool
+pass_store_merging::terminate_and_process_all_chains (basic_block bb)
+{
+ hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator iter
+ = m_stores.begin ();
+ bool ret = false;
+ for (; iter != m_stores.end (); ++iter)
+ ret |= terminate_and_release_chain ((*iter).first);
+
+ if (ret)
+ gimple_purge_dead_eh_edges (bb);
Why do you need this? I don't see how merging stores should effect EH
edges at all -- unless you are removing EH side-effects (ISTR you
are not merging cross-BB).
I was seeing a testsuite failure in the C++ testsuite,
an ICE about EH edges. However, when I rerun the testsuite today
without the gimple_purge_dead_eh_edges call I don't see any fallout...
I have been rebasing the patch against trunk regularly so maybe something
change in the underlying trunk since then...
<snip>
+
+bool
+imm_store_chain_info::coalesce_immediate_stores ()
+{
+ /* Anything less can't be processed. */
+ if (m_store_info.length () < 2)
+ return false;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Attempting to coalesce %u stores in chain.\n",
+ m_store_info.length ());
+
+ store_immediate_info *info;
+ unsigned int i;
+
+ /* Order the stores by the bitposition they write to. */
+ m_store_info.qsort (sort_by_bitpos);
+
+ info = m_store_info[0];
+ merged_store_group *merged_store = new merged_store_group (info);
+ m_merged_store_groups.safe_push (merged_store);
+
+ FOR_EACH_VEC_ELT (m_store_info, i, info)
+ {
+ if (dump_file)
+ {
+ fprintf (dump_file, "Store %u:\nbitsize:" HOST_WIDE_INT_PRINT_DEC
+ " bitpos:" HOST_WIDE_INT_PRINT_DEC " val:\n",
+ i, info->bitsize, info->bitpos);
+ print_generic_expr (dump_file, info->val, 0);
+ fprintf (dump_file, "\n------------\n");
+ }
+
+ if (i == 0)
+ continue;
+
+ /* |---store 1---|
+ |---store 2---|
+ Overlapping stores. */
+ unsigned HOST_WIDE_INT start = info->bitpos;
+ if (IN_RANGE (start, merged_store->start,
+ merged_store->start + merged_store->width - 1))
+ {
+ merged_store->merge_overlapping (info);
+ continue;
+ }
+
+ /* |---store 1---| <gap> |---store 2---|.
+ Gap between stores. Start a new group. */
+ if (start != merged_store->start + merged_store->width)
+ {
+ merged_store = new merged_store_group (info);
+ m_merged_store_groups.safe_push (merged_store);
So this will create merged_store_group even for single-store groups
(all stores have gaps between them)? Can't we optimize this maybe
common case to not perform the memory allocation? Like handle
this case first and lazily allocate a new group. Looks like
apply_stores also doesn't bail out early for single-element groups.
Yeah, we can fast-path single-store groups.
+
+ num_stmts++;
+
+ try_pos += try_size / BITS_PER_UNIT;
+
+ wi_offset += try_size;
+ size -= try_size;
+ align = try_size;
+ while (size < try_size)
+ try_size /= 2;
+
+ if (num_stmts >= orig_num_stmts)
+ {
Hmm. I think that if you'd end up with an unchanged stmt simply leave it
there. I don't see how you'd ever end up with more stmts than in the
original source?
We don't ever end up with more statements, this could
be if (num_stmts == orig_num_stmts) to catch the case
where we can't improve on the original statement count.
+ return false;
+
+ return code != VOID_CST;
Huh, do you really hit this?
Not really, I just looked through tree.def on what tree codes
were tcc_constant and removed those that were handled by native_encode_expr.
I can remove this.
+ || !rhs_valid_for_store_merging_p (rhs)
+ || TREE_THIS_VOLATILE (base_addr);
TREE_THIS_VOLATILE is handled above already.
I found that it doesn't and caused various issues.
While the stmt didn't trigger gimple_has_volatile_ops the result of
get_inner reference could be TREE_THIS_VOLATILE. Not doing this check
caused havoc later on (duplicate entries in the hash table for the same
tree because apparently volatile trees don't hash to the same location).
Thanks,
Kyrill