On Wed, 26 Feb 2020, luoxhu wrote:

> On 2020/2/18 17:57, Richard Biener wrote:
> > On Tue, 18 Feb 2020, Xionghu Luo wrote:
> > 
> >> Store-merging pass should run twice, the reason is pass fre/pre will do
> >> some kind of optimizations to instructions by:
> >>    1. Converting the load from address to load from function arguments
> >>    (store_merging_30.c:foo1).
> >>    2. Converting the byte access to BIT_FIELD_REF(store_merging_30.c:foo2).
> >>    3. Other bitfield combinations or potential interference optimizations 
> >> etc.
> >> These optimizations will break the store chain, store-merging pass fails
> >> to catch such kind of pattern so stores are not merged in middle end,
> >> then consecutive stb/sth instructions(should be merged to stw) are emitted
> >> finally.
> >>
> >> And why not directly move store-merging pass(numbered 194) just before
> >> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by
> >> store_merging1, and 4 merges are done fore store_merge2. So, keep the
> >> original store_merge as store_merge2 as store merge may be still available
> >> after other pass optimizations.  Most of the 30 store_merging_N.c test
> >> case dg-final pass name would be updated from store-merging to
> >> store-merging1 once this RFC patch idea got confirmed.
> >> Any comments?  Thanks.
> > 
> > Generally trying to solve a pass ordering issue by re-ordering/duplicating
> > passes might help a single testcase but will surely pessimize others.
> > So that's a no-go.
> > 
> > What the testcase shows is that store-merging needs to operate
> > similar to bswap when trying to find a "common" source for a combined
> > store.  That is, it should appropriately follow defs.  For foo2 I expect
> > it to be not too difficult, for foo1 I'd say we miss a FRE opportunity
> > here (but Jakub is working on that one IIRC).
> 
> Thanks Richard, not sure about my understanding and please correct if any.
> 
> I tried Jukub's latest patch of "sccvn: Handle bitfields in push_partial_def".
> Got to know fre pass checks the load instruction's vuse chain and do the 
> constant
> bitfield combines in push_partial_def, then 
> native_encode_expr/native_interpret_expr
> can decode and encode the constant content and shift/combine the data.  
> This should be based on one change to my test case(by adding return 
> page->counters;)
> to trigger the fre pass push all SSA name's partial_defs.  Currently, for SSA 
> variables,
> this encode/interpret is not supported yet, I suppose this is the opportunity 
> you mean.
> As this case's input is not constant, so Jukub's patch doesn't fix it.
> 
> struct page
> {
>   union
>   {
>     unsigned counters;
>     struct
>     {
>       union
>       {
>         struct
>         {
>           unsigned inuse : 16;
>           unsigned inuse2 : 8;
>           unsigned objects : 5;
>           unsigned frozen : 3;
>         };
>       };
>     };
>   };
> };
> 
> unsigned
> foo1 (struct page *page, unsigned long counters_new)
> {
>         struct page tmp;
>         tmp.counters = counters_new;
>         page->inuse   = tmp.inuse;
>         page->inuse2  = tmp.inuse2;
>         page->objects = tmp.objects;
>         page->frozen  = tmp.frozen;
>         return page->counters;
> }
> 
> If "return page->counters;" is removed, this case won't match the fre's 
> current code path
> in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters) to 
> walk all the vuses.
> So it seems not a fre opportunity but exactly a store merging issue(Also 
> checked llvm that it 
> doesn't generate byte store and short store so it only produces 1 stw for all 
> patterns).

Huh, but the code has the return page->counters and you expec it to
return counters_new unchanged.  Yes, you also expect the code to be
optimized to

   page->counters = counters_new;
   return counters_new;

and that part would indeed be a store-merging opportunity.  Currently
FRE does

  _1 = (unsigned int) counters_new_6(D);
  tmp.D.1943.counters = _1;
  _18 = (short unsigned int) counters_new_6(D);
  page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse = _18;
  _19 = BIT_FIELD_REF <_1, 8, 16>;
  page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse2 = _19;
  _4 = tmp.D.1943.D.1942.D.1941.D.1940.objects;
  page_9(D)->D.1943.D.1942.D.1941.D.1940.objects = _4;
  _5 = tmp.D.1943.D.1942.D.1941.D.1940.frozen;
  page_9(D)->D.1943.D.1942.D.1941.D.1940.frozen = _5;

so it fails to optimize the other loads to BIT_FIELD_REFs of _1.  If it
did that then store-merging woudl already handle the code I think.

> Optimize this case in pass store-merging is reasonable, just as you said,
> "trying to find a "common" source for a combined store.", it requires break
> store-merging pass's behavior: so far store-merging pass track all store
> instruction's RHS and stops when RHS is a load instruction with <base_addr, 
> bitsize,
> bitpos, bitregion_start, bitregion_end> extracted, convert instructions and 
> bitfield
> instructions generated by pass fre are ignored in 
> pass_store_merging::process_store.
> To teach the process_store capture real common source requires refactoring 
> handled_load
> to continue tracking even RHS is a load instruction and support the convert  
> instructions
> and bitfield_instructions, this seems to be a big change.

Well, as said above FRE would elide the loads and you'll see a series of
stores with BIT_FIELD_REF as sources.

> Regarding to Jakub's comments, merging could reduce many byte stores and half 
> stores
> to improve performance for this type of case.  There is already an 033t.esra 
> running before,
> and not sure whether SRA should replace such kind of bitfield operations.
> Adding a store-merging pass is so simple and many passes run more than once...
> So which would be best for this optimization, please?  Thanks again :)

I think the testcase as-is might be also optimized with Andrews pending
work to lower bitfield accesses.  So a slightly adjusted testcase not
using bitfields but maybe char (or mixed char/short) members would be
useful to track as well.  There SRA might apply (SRA backs off of
doing anything to bitfields currently).

Richard.

> Xionghu
> 
> > 
> > Richard.
> > 
> >> PS:
> >> Before this patch, store_merging_30.c.035t.fre1:
> >>
> >> ... foo1:
> >> Inserted _13 = (short unsigned int) counters_new_5(D);
> >> Replaced tmp.D.2912.D.2911.D.2910.D.2909.inuse with _13 in all uses of
> >> _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse;
> >> Removing dead stmt _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse;
> >> ... foo2:
> >> Inserted _17 = BIT_FIELD_REF <_1, 8, 16>;
> >> Replaced tmp.D.2926.D.2925.D.2924.D.2923.objects with _17 in all uses of
> >> _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects;
> >> Removing dead stmt _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects;
> >>
> >> foo1 asm:
> >> rldicl 9,4,48,48
> >> sth 4,0(3)
> >> sth 9,2(3)
> >> blr
> >>
> >> With this patch(similar for foo2):
> >>
> >> stw r4,0(r3)
> >> blr
> >>
> >> gcc/ChangeLog
> >>
> >> 2020-02-18  Xiong Hu Luo  <luo...@linux.ibm.com>
> >>
> >>    Part of PR middle-end/71509
> >>    gimple-ssa-store-merging.c (clone): New.
> >>    passes.def (pass_store_merging): New.
> >>
> >> gcc/testsuite/ChangeLog
> >>
> >> 2020-02-18  Xiong Hu Luo  <luo...@linux.ibm.com>
> >>
> >>    Part of PR middle-end/71509
> >>    testsuite/gcc.dg/store_merging_14.c: Update.
> >>    testsuite/gcc.dg/store_merging_30.c: New.
> >> ---
> >>   gcc/gimple-ssa-store-merging.c          |  2 +
> >>   gcc/passes.def                          |  1 +
> >>   gcc/testsuite/gcc.dg/store_merging_14.c |  3 +-
> >>   gcc/testsuite/gcc.dg/store_merging_30.c | 86 +++++++++++++++++++++++++
> >>   4 files changed, 91 insertions(+), 1 deletion(-)
> >>   create mode 100644 gcc/testsuite/gcc.dg/store_merging_30.c
> >>
> >> diff --git a/gcc/gimple-ssa-store-merging.c 
> >> b/gcc/gimple-ssa-store-merging.c
> >> index 8371323ef4a..9a5bd49fc3a 100644
> >> --- a/gcc/gimple-ssa-store-merging.c
> >> +++ b/gcc/gimple-ssa-store-merging.c
> >> @@ -2156,6 +2156,8 @@ public:
> >>     {
> >>     }
> >>   
> >> +  opt_pass * clone () { return new pass_store_merging (m_ctxt); }
> >> +
> >>     /* Pass not supported for PDP-endian, nor for insane hosts or
> >>        target character sizes where native_{encode,interpret}_expr
> >>        doesn't work properly.  */
> >> diff --git a/gcc/passes.def b/gcc/passes.def
> >> index 2bf2cb78fc5..e531531cb14 100644
> >> --- a/gcc/passes.def
> >> +++ b/gcc/passes.def
> >> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
> >>      /* pass_build_ealias is a dummy pass that ensures that we
> >>         execute TODO_rebuild_alias at this point.  */
> >>      NEXT_PASS (pass_build_ealias);
> >> +    NEXT_PASS (pass_store_merging);
> >>      NEXT_PASS (pass_fre, true /* may_iterate */);
> >>      NEXT_PASS (pass_early_vrp);
> >>      NEXT_PASS (pass_merge_phi);
> >> diff --git a/gcc/testsuite/gcc.dg/store_merging_14.c 
> >> b/gcc/testsuite/gcc.dg/store_merging_14.c
> >> index 9310aaf3489..bd120d18ac6 100644
> >> --- a/gcc/testsuite/gcc.dg/store_merging_14.c
> >> +++ b/gcc/testsuite/gcc.dg/store_merging_14.c
> >> @@ -214,4 +214,5 @@ main ()
> >>     return 0;
> >>   }
> >>   
> >> -/* { dg-final { scan-tree-dump-times "Merging successful" 9 
> >> "store-merging" } } */
> >> +/* { dg-final { scan-tree-dump-times "Merging successful" 5 
> >> "store-merging1" } } */
> >> +/* { dg-final { scan-tree-dump-times "Merging successful" 4 
> >> "store-merging2" } } */
> >> diff --git a/gcc/testsuite/gcc.dg/store_merging_30.c 
> >> b/gcc/testsuite/gcc.dg/store_merging_30.c
> >> new file mode 100644
> >> index 00000000000..71369c3b196
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/store_merging_30.c
> >> @@ -0,0 +1,86 @@
> >> +/* { dg-do run } */
> >> +/* { dg-require-effective-target store_merge } */
> >> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> >> +
> >> +typedef unsigned int atomic_t;
> >> +
> >> +struct page
> >> +{
> >> +  union
> >> +  {
> >> +    unsigned long counters;
> >> +    struct
> >> +    {
> >> +      union
> >> +      {
> >> +  struct
> >> +  {
> >> +    unsigned inuse : 16;
> >> +    unsigned objects : 15;
> >> +    unsigned frozen : 1;
> >> +  };
> >> +      };
> >> +    };
> >> +  };
> >> +};
> >> +
> >> +struct page2
> >> +{
> >> +  union
> >> +  {
> >> +    unsigned counters;
> >> +    struct
> >> +    {
> >> +      union
> >> +      {
> >> +  struct
> >> +  {
> >> +    unsigned inuse : 16;
> >> +    unsigned objects : 8;
> >> +    unsigned frozen : 8;
> >> +  };
> >> +      };
> >> +    };
> >> +  };
> >> +};
> >> +
> >> +__attribute__((noipa)) void
> >> +foo1 (struct page *page, unsigned long counters_new)
> >> +{
> >> +        struct page tmp;
> >> +        tmp.counters = counters_new;
> >> +        page->inuse   = tmp.inuse;
> >> +        page->objects = tmp.objects;
> >> +        page->frozen  = tmp.frozen;
> >> +}
> >> +
> >> +__attribute__((noipa)) void
> >> +foo2 (struct page2 *page2, unsigned long counters_new)
> >> +{
> >> +        struct page2 tmp;
> >> +        tmp.counters = counters_new;
> >> +        page2->inuse   = tmp.inuse;
> >> +        page2->objects = tmp.objects;
> >> +        page2->frozen  = tmp.frozen;
> >> +}
> >> +
> >> +int main ()
> >> +{
> >> +  struct page page;
> >> +  foo1 (&page, 0x12345678ABCDEFFEUL);
> >> +
> >> +  asm volatile ("" : : : "memory");
> >> +  if (page.frozen != 1 || page.objects != 0x2bcd || page.inuse != 0xeffe)
> >> +    __builtin_abort ();
> >> +
> >> +  struct page2 page2;
> >> +  foo2 (&page2, 0x12345678ABCDEFFEUL);
> >> +
> >> +  asm volatile ("" : : : "memory");
> >> +
> >> +  if (page2.frozen != 0xab || page2.objects != 0xcd || page2.inuse != 
> >> 0xeffe)
> >> +    __builtin_abort ();
> >> +  return 0;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "Merging successful" 2 
> >> "store-merging1" } } */
> >>
> > 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to