On Thu, 29 Sep 2016, Kyrill Tkachov wrote: > Hi Richard, > > Thanks for the detailed comments, I'll be working on addressing them. > Below are answers to some of your questions: > > On 29/09/16 11:45, Richard Biener wrote: > > > > + > > + /* If we're inserting a non-bytesized width or not at a byte boundary > > + use an intermediate wide_int to perform the bit-insertion correctly. > > */ > > + if (sub_byte_op_p > > + || (TREE_CODE (expr) == INTEGER_CST > > + && mode_for_size (bitlen, MODE_INT, 0) == BLKmode)) > > I wonder when we have BLKmode here ... > > > > > + { > > > + unsigned int byte_size = last_byte - first_byte + 1; > > > + > > > + /* The functions native_encode_expr/native_interpret_expr uses the > > > + TYPE_MODE of the type to determine the number of bytes to write/read > > > + so if we want to process a number of bytes that does not have a > > > + TYPE_MODE of equal size we need to use a type that has a valid mode > > > + for it. */ > > > + > > > + machine_mode mode > > > + = smallest_mode_for_size (byte_size * BITS_PER_UNIT, MODE_INT); > > > + tree dest_int_type > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), UNSIGNED); > > > + byte_size = GET_MODE_SIZE (mode); > > ... how we ever get non-BLKmode here. > > smallest_mode_for_size is guaranteed to never return BLKmode. > It returns a mode that contains at least the requested number of bits. > mode_for_size returns BLKmode if no mode fits the exact number of bits.
But then it's really about GET_MODE_SIZE (TYPE_MODE (expr)) vs. TYPE_PRECISION (expr). I see no need to invoke [smallest_]mode_for_size. > > > > > + /* The region from the byte array that we're inserting into. */ > > > + tree ptr_wide_int > > > + = native_interpret_expr (dest_int_type, ptr + first_byte, > > > + total_bytes); > > > + > > > + gcc_assert (ptr_wide_int); > > > + wide_int dest_wide_int > > > + = wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type)); > > > + wide_int expr_wide_int > > > + = wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT); > > > + if (BYTES_BIG_ENDIAN) > > > + { > > > + unsigned int insert_pos > > > + = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT); > > > + dest_wide_int > > > + = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen); > > > + } > > > + else > > > + dest_wide_int = wi::insert (dest_wide_int, expr_wide_int, > > > + bitpos % BITS_PER_UNIT, bitlen); > > > + > > > + tree res = wide_int_to_tree (dest_int_type, dest_wide_int); > > > + native_encode_expr (res, ptr + first_byte, total_bytes, 0); > > > + > > OTOH this whole dance looks as complicated and way more expensive than > > using native_encode_expr into a temporary buffern and then a > > manually implemented "bit-merging" of it at ptr + first_byte + bitpos. > > AFAICS that operation is even endianess agnostic. > > I did try implementing that, but it kept blowing up in big-endian. I'm really curious how ;) And I'm willing to help debugging in case you have sth basic working. I suggest to work on a byte granularity here (bytes have no endianess!). > I found it to be very fiddly. I can try again, but we'll see how it goes... > > <snip> > > > > + > > > +bool > > > +pass_store_merging::terminate_and_process_all_chains (basic_block bb) > > > +{ > > > + hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator > > > iter > > > + = m_stores.begin (); > > > + bool ret = false; > > > + for (; iter != m_stores.end (); ++iter) > > > + ret |= terminate_and_release_chain ((*iter).first); > > > + > > > + if (ret) > > > + gimple_purge_dead_eh_edges (bb); > > Why do you need this? I don't see how merging stores should effect EH > > edges at all -- unless you are removing EH side-effects (ISTR you > > are not merging cross-BB). > > I was seeing a testsuite failure in the C++ testsuite, > an ICE about EH edges. However, when I rerun the testsuite today > without the gimple_purge_dead_eh_edges call I don't see any fallout... > I have been rebasing the patch against trunk regularly so maybe something > change in the underlying trunk since then... I'd doubt that. I don't see any check for whether a store may throw internally so that missing check certainly would explain things. You'd need -fnon-call-exceptions but you also need to have an interesting to achieve setup where you have one store of a group marked as not throwing and one as throwing. I'd say throw some stmt_can_throw_internal in and mark such stores as invalid and remove the edge purging. That should do the trick. > <snip> > > > + > > +bool > > +imm_store_chain_info::coalesce_immediate_stores () > > +{ > > + /* Anything less can't be processed. */ > > + if (m_store_info.length () < 2) > > + return false; > > + > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, "Attempting to coalesce %u stores in chain.\n", > > + m_store_info.length ()); > > + > > + store_immediate_info *info; > > + unsigned int i; > > + > > + /* Order the stores by the bitposition they write to. */ > > + m_store_info.qsort (sort_by_bitpos); > > + > > + info = m_store_info[0]; > > + merged_store_group *merged_store = new merged_store_group (info); > > + m_merged_store_groups.safe_push (merged_store); > > + > > + FOR_EACH_VEC_ELT (m_store_info, i, info) > > + { > > + if (dump_file) > > + { > > + fprintf (dump_file, "Store %u:\nbitsize:" HOST_WIDE_INT_PRINT_DEC > > + " bitpos:" HOST_WIDE_INT_PRINT_DEC " val:\n", > > + i, info->bitsize, info->bitpos); > > + print_generic_expr (dump_file, info->val, 0); > > + fprintf (dump_file, "\n------------\n"); > > + } > > + > > + if (i == 0) > > + continue; > > + > > + /* |---store 1---| > > + |---store 2---| > > + Overlapping stores. */ > > + unsigned HOST_WIDE_INT start = info->bitpos; > > + if (IN_RANGE (start, merged_store->start, > > + merged_store->start + merged_store->width - 1)) > > + { > > + merged_store->merge_overlapping (info); > > + continue; > > + } > > + > > + /* |---store 1---| <gap> |---store 2---|. > > + Gap between stores. Start a new group. */ > > + if (start != merged_store->start + merged_store->width) > > + { > > + merged_store = new merged_store_group (info); > > + m_merged_store_groups.safe_push (merged_store); > > So this will create merged_store_group even for single-store groups > > (all stores have gaps between them)? Can't we optimize this maybe > > common case to not perform the memory allocation? Like handle > > this case first and lazily allocate a new group. Looks like > > apply_stores also doesn't bail out early for single-element groups. > > Yeah, we can fast-path single-store groups. > > > > + > > > + num_stmts++; > > > + > > > + try_pos += try_size / BITS_PER_UNIT; > > > + > > > + wi_offset += try_size; > > > + size -= try_size; > > > + align = try_size; > > > + while (size < try_size) > > > + try_size /= 2; > > > + > > > + if (num_stmts >= orig_num_stmts) > > > + { > > Hmm. I think that if you'd end up with an unchanged stmt simply leave it > > there. I don't see how you'd ever end up with more stmts than in the > > original source? > > We don't ever end up with more statements, this could > be if (num_stmts == orig_num_stmts) to catch the case > where we can't improve on the original statement count. > > > > > > + return false; > > > + > > > + return code != VOID_CST; > > Huh, do you really hit this? > > Not really, I just looked through tree.def on what tree codes > were tcc_constant and removed those that were handled by native_encode_expr. > I can remove this. > > > > + || !rhs_valid_for_store_merging_p (rhs) > > > + || TREE_THIS_VOLATILE (base_addr); > > TREE_THIS_VOLATILE is handled above already. > > I found that it doesn't and caused various issues. > While the stmt didn't trigger gimple_has_volatile_ops the result of > get_inner reference could be TREE_THIS_VOLATILE. Not doing this check > caused havoc later on (duplicate entries in the hash table for the same > tree because apparently volatile trees don't hash to the same location). Yes, you can have a MEM[&volatile-decl] that is actually not a volatile access but get_inner_reference will return volatile-decl as base (now marked TREE_THIS_VOLATILE). The hashing will not consider those equal. I suppose this case doesn't matter enough to be worth "fixing" in some way but please add a comment above the TREE_THIS_VOLATILE check about its reason. Thanks, Richard.