On Mon, 24 Oct 2016, Kyrill Tkachov wrote: > > On 21/10/16 19:37, Richard Biener wrote: > > On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachov > > <kyrylo.tkac...@foss.arm.com> wrote: > > > Hi Richard, > > > > > > On 21/10/16 13:37, Richard Biener wrote: > > > > On Tue, 18 Oct 2016, Kyrill Tkachov wrote: > > > > > > > > > Hi Richard, > > > > > > > > > > This patch is a merge of [1] and [2] and implements the manual > > > merging of > > > > > bitfields > > > > > as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN > > > too. > > > > > It caused me a lot of headeache because the bit offset is counted > > >from the > > > > > most significant bit > > > > > in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN > > > looks > > > > > irrelevant for store merging > > > > > anyway as it's just used to described RTL extract operations). > > > > > I've included ASCII diagrams of the steps in the merging algorithm. > > > > Heh, thanks. > > > > > > > > > Bootstrapped and tested on arm-none-linux-gnueabihf, > > > aarch64-none-linux-gnu, > > > > > x86_64-unknown-linux-gnu. > > > > > Also tested on aarch64_be-none-elf. > > > > > > > > > > How does this version look now? > > > > Mostly good. For > > > > > > > > +bool > > > > +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree > > > base, > > > > + gimple *stmt) > > > > +{ > > > > ... > > > > + /* Check if the assignment destination (BASE) is part of a store > > > chain. > > > > + This is to catch non-constant stores to destinations that may > > > be > > > > part > > > > + of a chain. */ > > > > + if (base) > > > > + { > > > > + chain_info = m_stores.get (base); > > > > + if (chain_info) > > > > + { > > > > + struct store_immediate_info *info; > > > > + unsigned int i; > > > > + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) > > > > + { > > > > + if (refs_may_alias_p (info->dest, dest)) > > > > + { > > > > > > > > I suppose the chain is not yet sorted in any way? > > > > > > > > At least for 'dest' which do not have a known constant offset we > > > > could do > > > > > > > > if (base) > > > > terminate_and_release_chain (base); > > > Do you mean when get_inner_reference returns non-NULL for POFFSET? > > Yes. > > > > > Or do you think we should try to look into dest in this function? > > > > > > > to speed things up? IIRC we do not terminate chains early in > > > > this phase when we have enough stores to form a group, so > > > > writing a testcase that triggers quadraticness would be as simple > > > > as having > > > > > > > > char a[1000000]; > > > > > > > > void foo () > > > > { > > > > a[0] = 1; > > > > a[1] = 2; > > > > .... > > > > a[9999999] = 3; > > > > } > > > > > > > > ? > > > > > > > > so I think you probably want to limit the number of stores you > > > > ever put onto a chain and if you reach that limit, terminate > > > > and release it? Like just choose 16 or 64? (and experiment > > > > with the above kind of testcases) > > > I was initially thinking of imposing such a limit as well but > > > later I thought we'd want to extend the output code to be able to emit > > > a memcpy (or memset) call for large regions, so detecting the largest > > > possible > > > regions would be needed. > > But then we need a better data structure here to avoid the quadraticness. > > For the testcase: > > char a[1000000]; > > void foo () > { > a[0] = 1; > a[1] = 2; > .... > a[9999999] = 3; > } > > and similar we don't have quadratic behaviour because constant stores to a > constant offset > from base (i.e. stores valid for merging) get automatically inserted into the > chain without > doing the alias checking (we keep track of their program order so that > overlapping stores > are merged properly). Since pushing into a vec is not a linear operation we're > fine.
Good. > Once I teach it to quickly terminate chains when encountering stores to base > with a variable > offset (like you suggest above) there is only one case of quadraticness: > We walk the stores in the chain (causing quadratic behaviour) when among > the valid mergeable stores we have interspersed stores to a constant offset > from base but that > do not store a constant value e.g. a[5] = x; > So the input needs to be something like: > > char a[16000]; > > void > foo (char x) > { > a[0] = 1; a[8000 + 0] = x; > a[1] = 1; a[8000 + 1] = x; > a[2] = 1; a[8000 + 2] = x; > a[3] = 1; a[8000 + 3] = x; > a[4] = 1; a[8000 + 4] = x; > ... > a[7999] = 1; a[8000 + 7999] = x; > } > > I've reproduced a slowdown (with store merging taking a disproportional > time in -ftime-report) with this case and I've verified that limiting the > maximum number of statements in the chain fixes it. OK - as said, the algorithm trivially looks quadratic in some cases and we should generally avoid such behavior (and it is easy to do so). > > > > > But that is not implemented yet (though I have > > > experimented > > > with it) so I can add a limit here. Should I just hardcode a limit or > > > should I make it > > > into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)? > > A param is preferred. > > Thanks. I'll introduce the limit but, as mentioned above, the simple case > of inserting multiple stores is not quadratic. I just tried to thin of the most simple testcase to trigger it... Thanks, Richard. > Kyrill > > > > > > > + bit_off = byte_off << LOG2_BITS_PER_UNIT; > > > > + if (!wi::neg_p (bit_off) && wi::fits_shwi_p > > > (bit_off)) > > > > + { > > > > + bitpos += bit_off.to_shwi (); > > > > + > > > > > > > > I think you want bit_off += bitpos before the fits_shwi check > > > > otherwise this add may still overflow. > > > > > > > > + base_addr = copy_node (base_addr); > > > > + TREE_OPERAND (base_addr, 1) > > > > + = build_zero_cst (TREE_TYPE (TREE_OPERAND ( > > > > + base_addr, > > > 1))); > > > > I'd prefer > > > > > > > > base_addr = build2 (MEM_REF, ...); > > > > > > > > here. > > > Thanks for the feedback, > > > Kyrill > > > > > > > Thanks, > > > > Richard. > > > > > > > > > Thanks, > > > > > Kyrill > > > > > > > > > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html > > > > > [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html > > > > > > > > > > 2016-10-18 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > > > > > > > > > PR middle-end/22141 > > > > > * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. > > > > > * common.opt (fstore-merging): New Optimization option. > > > > > * opts.c (default_options_table): Add entry for > > > > > OPT_ftree_store_merging. > > > > > * fold-const.h (can_native_encode_type_p): Declare prototype. > > > > > * fold-const.c (can_native_encode_type_p): Define. > > > > > * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. > > > > > * passes.def: Insert pass_tree_store_merging. > > > > > * tree-pass.h (make_pass_store_merging): Declare extern > > > > > prototype. > > > > > * gimple-ssa-store-merging.c: New file. > > > > > * doc/invoke.texi (Optimization Options): Document > > > > > -fstore-merging. > > > > > > > > > > 2016-10-18 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > > > > Jakub Jelinek <ja...@redhat.com> > > > > > Andrew Pinski <pins...@gmail.com> > > > > > > > > > > PR middle-end/22141 > > > > > PR rtl-optimization/23684 > > > > > * gcc.c-torture/execute/pr22141-1.c: New test. > > > > > * gcc.c-torture/execute/pr22141-2.c: Likewise. > > > > > * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. > > > > > * gcc.target/aarch64/ldp_stp_4.c: Likewise. > > > > > * gcc.dg/store_merging_1.c: New test. > > > > > * gcc.dg/store_merging_2.c: Likewise. > > > > > * gcc.dg/store_merging_3.c: Likewise. > > > > > * gcc.dg/store_merging_4.c: Likewise. > > > > > * gcc.dg/store_merging_5.c: Likewise. > > > > > * gcc.dg/store_merging_6.c: Likewise. > > > > > * gcc.dg/store_merging_7.c: Likewise. > > > > > * gcc.target/i386/pr22141.c: Likewise. > > > > > * gcc.target/i386/pr34012.c: Add -fno-store-merging to > > > dg-options. > > > > > * g++.dg/init/new17.C: Likewise. > > > > > > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)