On Mon, 24 Oct 2016, Kyrill Tkachov wrote:

> 
> On 21/10/16 19:37, Richard Biener wrote:
> > On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachov
> > <kyrylo.tkac...@foss.arm.com> wrote:
> > > Hi Richard,
> > > 
> > > On 21/10/16 13:37, Richard Biener wrote:
> > > > On Tue, 18 Oct 2016, Kyrill Tkachov wrote:
> > > > 
> > > > > Hi Richard,
> > > > > 
> > > > > This patch is a merge of [1] and [2] and implements the manual
> > > merging of
> > > > > bitfields
> > > > > as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN
> > > too.
> > > > > It caused me a lot of headeache because the bit offset is counted
> > >from the
> > > > > most significant bit
> > > > > in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN
> > > looks
> > > > > irrelevant for store merging
> > > > > anyway as it's just used to described RTL extract operations).
> > > > > I've included ASCII diagrams of the steps in the merging algorithm.
> > > > Heh, thanks.
> > > > 
> > > > > Bootstrapped and tested on arm-none-linux-gnueabihf,
> > > aarch64-none-linux-gnu,
> > > > > x86_64-unknown-linux-gnu.
> > > > > Also tested on aarch64_be-none-elf.
> > > > > 
> > > > > How does this version look now?
> > > > Mostly good.  For
> > > > 
> > > > +bool
> > > > +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree
> > > base,
> > > > +                                                  gimple *stmt)
> > > > +{
> > > > ...
> > > > +  /* Check if the assignment destination (BASE) is part of a store
> > > chain.
> > > > +     This is to catch non-constant stores to destinations that may
> > > be
> > > > part
> > > > +     of a chain.  */
> > > > +  if (base)
> > > > +    {
> > > > +      chain_info = m_stores.get (base);
> > > > +      if (chain_info)
> > > > +       {
> > > > +         struct store_immediate_info *info;
> > > > +         unsigned int i;
> > > > +         FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
> > > > +           {
> > > > +             if (refs_may_alias_p (info->dest, dest))
> > > > +               {
> > > > 
> > > > I suppose the chain is not yet sorted in any way?
> > > > 
> > > > At least for 'dest' which do not have a known constant offset we
> > > > could do
> > > > 
> > > >      if (base)
> > > >        terminate_and_release_chain (base);
> > > Do you mean when get_inner_reference returns non-NULL for POFFSET?
> > Yes.
> > 
> > > Or do you think we should try to look into dest in this function?
> > > 
> > > > to speed things up?  IIRC we do not terminate chains early in
> > > > this phase when we have enough stores to form a group, so
> > > > writing a testcase that triggers quadraticness would be as simple
> > > > as having
> > > > 
> > > > char a[1000000];
> > > > 
> > > > void foo ()
> > > > {
> > > >    a[0] = 1;
> > > >    a[1] = 2;
> > > >    ....
> > > >    a[9999999] = 3;
> > > > }
> > > > 
> > > > ?
> > > > 
> > > > so I think you probably want to limit the number of stores you
> > > > ever put onto a chain and if you reach that limit, terminate
> > > > and release it?  Like just choose 16 or 64?  (and experiment
> > > > with the above kind of testcases)
> > > I was initially thinking of imposing such a limit as well but
> > > later I thought we'd want to extend the output code to be able to emit
> > > a memcpy (or memset) call for large regions, so detecting the largest
> > > possible
> > > regions would be needed.
> > But then we need a better data structure here to avoid the quadraticness.
> 
> For the testcase:
> 
> char a[1000000];
> 
> void foo ()
> {
>   a[0] = 1;
>   a[1] = 2;
>   ....
>   a[9999999] = 3;
> }
> 
> and similar we don't have quadratic behaviour because constant stores to a
> constant offset
> from base (i.e. stores valid for merging) get automatically inserted into the
> chain without
> doing the alias checking (we keep track of their program order so that
> overlapping stores
> are merged properly). Since pushing into a vec is not a linear operation we're
> fine.

Good.

> Once I teach it to quickly terminate chains when encountering stores to base
> with a variable
> offset (like you suggest above) there is only one case of quadraticness:
> We walk the stores in the chain (causing quadratic behaviour) when among
> the valid mergeable stores we have interspersed stores to a constant offset
> from base but that
> do not store a constant value e.g. a[5] = x;
> So the input needs to be something like:
> 
> char a[16000];
> 
> void
> foo (char x)
> {
>   a[0] = 1; a[8000 + 0] = x;
>   a[1] = 1; a[8000 + 1] = x;
>   a[2] = 1; a[8000 + 2] = x;
>   a[3] = 1; a[8000 + 3] = x;
>   a[4] = 1; a[8000 + 4] = x;
>   ...
>   a[7999] = 1; a[8000 + 7999] = x;
> }
> 
> I've reproduced a slowdown (with store merging taking a disproportional
> time in -ftime-report) with this case and I've verified that limiting the
> maximum number of statements in the chain fixes it.

OK - as said, the algorithm trivially looks quadratic in some cases
and we should generally avoid such behavior (and it is easy to do so).

> 
> > 
> >   But that is not implemented yet (though I have
> > > experimented
> > > with it) so I can add a limit here. Should I just hardcode a limit or
> > > should I make it
> > > into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)?
> > A param is preferred.
> 
> Thanks. I'll introduce the limit but, as mentioned above, the simple case
> of inserting multiple stores is not quadratic.

I just tried to thin of the most simple testcase to trigger it...

Thanks,
Richard.

> Kyrill
> 
> > 
> > > > +                 bit_off = byte_off << LOG2_BITS_PER_UNIT;
> > > > +                 if (!wi::neg_p (bit_off) && wi::fits_shwi_p
> > > (bit_off))
> > > > +                   {
> > > > +                     bitpos += bit_off.to_shwi ();
> > > > +
> > > > 
> > > > I think you want bit_off += bitpos before the fits_shwi check
> > > > otherwise this add may still overflow.
> > > > 
> > > > +                     base_addr = copy_node (base_addr);
> > > > +                     TREE_OPERAND (base_addr, 1)
> > > > +                       = build_zero_cst (TREE_TYPE (TREE_OPERAND (
> > > > +                                                      base_addr,
> > > 1)));
> > > > I'd prefer
> > > > 
> > > >         base_addr = build2 (MEM_REF, ...);
> > > > 
> > > > here.
> > > Thanks for the feedback,
> > > Kyrill
> > > 
> > > > Thanks,
> > > > Richard.
> > > > 
> > > > > Thanks,
> > > > > Kyrill
> > > > > 
> > > > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html
> > > > > [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html
> > > > > 
> > > > > 2016-10-18  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> > > > > 
> > > > >       PR middle-end/22141
> > > > >       * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
> > > > >       * common.opt (fstore-merging): New Optimization option.
> > > > >       * opts.c (default_options_table): Add entry for
> > > > >       OPT_ftree_store_merging.
> > > > >       * fold-const.h (can_native_encode_type_p): Declare prototype.
> > > > >       * fold-const.c (can_native_encode_type_p): Define.
> > > > >       * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
> > > > >       * passes.def: Insert pass_tree_store_merging.
> > > > >       * tree-pass.h (make_pass_store_merging): Declare extern
> > > > >       prototype.
> > > > >       * gimple-ssa-store-merging.c: New file.
> > > > >       * doc/invoke.texi (Optimization Options): Document
> > > > >       -fstore-merging.
> > > > > 
> > > > > 2016-10-18  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> > > > >               Jakub Jelinek  <ja...@redhat.com>
> > > > >               Andrew Pinski  <pins...@gmail.com>
> > > > > 
> > > > >       PR middle-end/22141
> > > > >       PR rtl-optimization/23684
> > > > >       * gcc.c-torture/execute/pr22141-1.c: New test.
> > > > >       * gcc.c-torture/execute/pr22141-2.c: Likewise.
> > > > >       * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
> > > > >       * gcc.target/aarch64/ldp_stp_4.c: Likewise.
> > > > >       * gcc.dg/store_merging_1.c: New test.
> > > > >       * gcc.dg/store_merging_2.c: Likewise.
> > > > >       * gcc.dg/store_merging_3.c: Likewise.
> > > > >       * gcc.dg/store_merging_4.c: Likewise.
> > > > >       * gcc.dg/store_merging_5.c: Likewise.
> > > > >       * gcc.dg/store_merging_6.c: Likewise.
> > > > >       * gcc.dg/store_merging_7.c: Likewise.
> > > > >       * gcc.target/i386/pr22141.c: Likewise.
> > > > >       * gcc.target/i386/pr34012.c: Add -fno-store-merging to
> > > dg-options.
> > > > >       * g++.dg/init/new17.C: Likewise.
> > > > > 
> > 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to