On Wed, 26 Feb 2020, luoxhu wrote: > On 2020/2/18 17:57, Richard Biener wrote: > > On Tue, 18 Feb 2020, Xionghu Luo wrote: > > > >> Store-merging pass should run twice, the reason is pass fre/pre will do > >> some kind of optimizations to instructions by: > >> 1. Converting the load from address to load from function arguments > >> (store_merging_30.c:foo1). > >> 2. Converting the byte access to BIT_FIELD_REF(store_merging_30.c:foo2). > >> 3. Other bitfield combinations or potential interference optimizations > >> etc. > >> These optimizations will break the store chain, store-merging pass fails > >> to catch such kind of pattern so stores are not merged in middle end, > >> then consecutive stb/sth instructions(should be merged to stw) are emitted > >> finally. > >> > >> And why not directly move store-merging pass(numbered 194) just before > >> fre1(numbered 35) is for case store_merging_14.c, 5 merges are done by > >> store_merging1, and 4 merges are done fore store_merge2. So, keep the > >> original store_merge as store_merge2 as store merge may be still available > >> after other pass optimizations. Most of the 30 store_merging_N.c test > >> case dg-final pass name would be updated from store-merging to > >> store-merging1 once this RFC patch idea got confirmed. > >> Any comments? Thanks. > > > > Generally trying to solve a pass ordering issue by re-ordering/duplicating > > passes might help a single testcase but will surely pessimize others. > > So that's a no-go. > > > > What the testcase shows is that store-merging needs to operate > > similar to bswap when trying to find a "common" source for a combined > > store. That is, it should appropriately follow defs. For foo2 I expect > > it to be not too difficult, for foo1 I'd say we miss a FRE opportunity > > here (but Jakub is working on that one IIRC). > > Thanks Richard, not sure about my understanding and please correct if any. > > I tried Jukub's latest patch of "sccvn: Handle bitfields in push_partial_def". > Got to know fre pass checks the load instruction's vuse chain and do the > constant > bitfield combines in push_partial_def, then > native_encode_expr/native_interpret_expr > can decode and encode the constant content and shift/combine the data. > This should be based on one change to my test case(by adding return > page->counters;) > to trigger the fre pass push all SSA name's partial_defs. Currently, for SSA > variables, > this encode/interpret is not supported yet, I suppose this is the opportunity > you mean. > As this case's input is not constant, so Jukub's patch doesn't fix it. > > struct page > { > union > { > unsigned counters; > struct > { > union > { > struct > { > unsigned inuse : 16; > unsigned inuse2 : 8; > unsigned objects : 5; > unsigned frozen : 3; > }; > }; > }; > }; > }; > > unsigned > foo1 (struct page *page, unsigned long counters_new) > { > struct page tmp; > tmp.counters = counters_new; > page->inuse = tmp.inuse; > page->inuse2 = tmp.inuse2; > page->objects = tmp.objects; > page->frozen = tmp.frozen; > return page->counters; > } > > If "return page->counters;" is removed, this case won't match the fre's > current code path > in vn_reference_lookup_3 without a load(_14 = page_9(D)->D.2912.counters) to > walk all the vuses. > So it seems not a fre opportunity but exactly a store merging issue(Also > checked llvm that it > doesn't generate byte store and short store so it only produces 1 stw for all > patterns).
Huh, but the code has the return page->counters and you expec it to return counters_new unchanged. Yes, you also expect the code to be optimized to page->counters = counters_new; return counters_new; and that part would indeed be a store-merging opportunity. Currently FRE does _1 = (unsigned int) counters_new_6(D); tmp.D.1943.counters = _1; _18 = (short unsigned int) counters_new_6(D); page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse = _18; _19 = BIT_FIELD_REF <_1, 8, 16>; page_9(D)->D.1943.D.1942.D.1941.D.1940.inuse2 = _19; _4 = tmp.D.1943.D.1942.D.1941.D.1940.objects; page_9(D)->D.1943.D.1942.D.1941.D.1940.objects = _4; _5 = tmp.D.1943.D.1942.D.1941.D.1940.frozen; page_9(D)->D.1943.D.1942.D.1941.D.1940.frozen = _5; so it fails to optimize the other loads to BIT_FIELD_REFs of _1. If it did that then store-merging woudl already handle the code I think. > Optimize this case in pass store-merging is reasonable, just as you said, > "trying to find a "common" source for a combined store.", it requires break > store-merging pass's behavior: so far store-merging pass track all store > instruction's RHS and stops when RHS is a load instruction with <base_addr, > bitsize, > bitpos, bitregion_start, bitregion_end> extracted, convert instructions and > bitfield > instructions generated by pass fre are ignored in > pass_store_merging::process_store. > To teach the process_store capture real common source requires refactoring > handled_load > to continue tracking even RHS is a load instruction and support the convert > instructions > and bitfield_instructions, this seems to be a big change. Well, as said above FRE would elide the loads and you'll see a series of stores with BIT_FIELD_REF as sources. > Regarding to Jakub's comments, merging could reduce many byte stores and half > stores > to improve performance for this type of case. There is already an 033t.esra > running before, > and not sure whether SRA should replace such kind of bitfield operations. > Adding a store-merging pass is so simple and many passes run more than once... > So which would be best for this optimization, please? Thanks again :) I think the testcase as-is might be also optimized with Andrews pending work to lower bitfield accesses. So a slightly adjusted testcase not using bitfields but maybe char (or mixed char/short) members would be useful to track as well. There SRA might apply (SRA backs off of doing anything to bitfields currently). Richard. > Xionghu > > > > > Richard. > > > >> PS: > >> Before this patch, store_merging_30.c.035t.fre1: > >> > >> ... foo1: > >> Inserted _13 = (short unsigned int) counters_new_5(D); > >> Replaced tmp.D.2912.D.2911.D.2910.D.2909.inuse with _13 in all uses of > >> _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse; > >> Removing dead stmt _1 = tmp.D.2912.D.2911.D.2910.D.2909.inuse; > >> ... foo2: > >> Inserted _17 = BIT_FIELD_REF <_1, 8, 16>; > >> Replaced tmp.D.2926.D.2925.D.2924.D.2923.objects with _17 in all uses of > >> _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects; > >> Removing dead stmt _3 = tmp.D.2926.D.2925.D.2924.D.2923.objects; > >> > >> foo1 asm: > >> rldicl 9,4,48,48 > >> sth 4,0(3) > >> sth 9,2(3) > >> blr > >> > >> With this patch(similar for foo2): > >> > >> stw r4,0(r3) > >> blr > >> > >> gcc/ChangeLog > >> > >> 2020-02-18 Xiong Hu Luo <luo...@linux.ibm.com> > >> > >> Part of PR middle-end/71509 > >> gimple-ssa-store-merging.c (clone): New. > >> passes.def (pass_store_merging): New. > >> > >> gcc/testsuite/ChangeLog > >> > >> 2020-02-18 Xiong Hu Luo <luo...@linux.ibm.com> > >> > >> Part of PR middle-end/71509 > >> testsuite/gcc.dg/store_merging_14.c: Update. > >> testsuite/gcc.dg/store_merging_30.c: New. > >> --- > >> gcc/gimple-ssa-store-merging.c | 2 + > >> gcc/passes.def | 1 + > >> gcc/testsuite/gcc.dg/store_merging_14.c | 3 +- > >> gcc/testsuite/gcc.dg/store_merging_30.c | 86 +++++++++++++++++++++++++ > >> 4 files changed, 91 insertions(+), 1 deletion(-) > >> create mode 100644 gcc/testsuite/gcc.dg/store_merging_30.c > >> > >> diff --git a/gcc/gimple-ssa-store-merging.c > >> b/gcc/gimple-ssa-store-merging.c > >> index 8371323ef4a..9a5bd49fc3a 100644 > >> --- a/gcc/gimple-ssa-store-merging.c > >> +++ b/gcc/gimple-ssa-store-merging.c > >> @@ -2156,6 +2156,8 @@ public: > >> { > >> } > >> > >> + opt_pass * clone () { return new pass_store_merging (m_ctxt); } > >> + > >> /* Pass not supported for PDP-endian, nor for insane hosts or > >> target character sizes where native_{encode,interpret}_expr > >> doesn't work properly. */ > >> diff --git a/gcc/passes.def b/gcc/passes.def > >> index 2bf2cb78fc5..e531531cb14 100644 > >> --- a/gcc/passes.def > >> +++ b/gcc/passes.def > >> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. If not see > >> /* pass_build_ealias is a dummy pass that ensures that we > >> execute TODO_rebuild_alias at this point. */ > >> NEXT_PASS (pass_build_ealias); > >> + NEXT_PASS (pass_store_merging); > >> NEXT_PASS (pass_fre, true /* may_iterate */); > >> NEXT_PASS (pass_early_vrp); > >> NEXT_PASS (pass_merge_phi); > >> diff --git a/gcc/testsuite/gcc.dg/store_merging_14.c > >> b/gcc/testsuite/gcc.dg/store_merging_14.c > >> index 9310aaf3489..bd120d18ac6 100644 > >> --- a/gcc/testsuite/gcc.dg/store_merging_14.c > >> +++ b/gcc/testsuite/gcc.dg/store_merging_14.c > >> @@ -214,4 +214,5 @@ main () > >> return 0; > >> } > >> > >> -/* { dg-final { scan-tree-dump-times "Merging successful" 9 > >> "store-merging" } } */ > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 5 > >> "store-merging1" } } */ > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 4 > >> "store-merging2" } } */ > >> diff --git a/gcc/testsuite/gcc.dg/store_merging_30.c > >> b/gcc/testsuite/gcc.dg/store_merging_30.c > >> new file mode 100644 > >> index 00000000000..71369c3b196 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/store_merging_30.c > >> @@ -0,0 +1,86 @@ > >> +/* { dg-do run } */ > >> +/* { dg-require-effective-target store_merge } */ > >> +/* { dg-options "-O2 -fdump-tree-store-merging" } */ > >> + > >> +typedef unsigned int atomic_t; > >> + > >> +struct page > >> +{ > >> + union > >> + { > >> + unsigned long counters; > >> + struct > >> + { > >> + union > >> + { > >> + struct > >> + { > >> + unsigned inuse : 16; > >> + unsigned objects : 15; > >> + unsigned frozen : 1; > >> + }; > >> + }; > >> + }; > >> + }; > >> +}; > >> + > >> +struct page2 > >> +{ > >> + union > >> + { > >> + unsigned counters; > >> + struct > >> + { > >> + union > >> + { > >> + struct > >> + { > >> + unsigned inuse : 16; > >> + unsigned objects : 8; > >> + unsigned frozen : 8; > >> + }; > >> + }; > >> + }; > >> + }; > >> +}; > >> + > >> +__attribute__((noipa)) void > >> +foo1 (struct page *page, unsigned long counters_new) > >> +{ > >> + struct page tmp; > >> + tmp.counters = counters_new; > >> + page->inuse = tmp.inuse; > >> + page->objects = tmp.objects; > >> + page->frozen = tmp.frozen; > >> +} > >> + > >> +__attribute__((noipa)) void > >> +foo2 (struct page2 *page2, unsigned long counters_new) > >> +{ > >> + struct page2 tmp; > >> + tmp.counters = counters_new; > >> + page2->inuse = tmp.inuse; > >> + page2->objects = tmp.objects; > >> + page2->frozen = tmp.frozen; > >> +} > >> + > >> +int main () > >> +{ > >> + struct page page; > >> + foo1 (&page, 0x12345678ABCDEFFEUL); > >> + > >> + asm volatile ("" : : : "memory"); > >> + if (page.frozen != 1 || page.objects != 0x2bcd || page.inuse != 0xeffe) > >> + __builtin_abort (); > >> + > >> + struct page2 page2; > >> + foo2 (&page2, 0x12345678ABCDEFFEUL); > >> + > >> + asm volatile ("" : : : "memory"); > >> + > >> + if (page2.frozen != 0xab || page2.objects != 0xcd || page2.inuse != > >> 0xeffe) > >> + __builtin_abort (); > >> + return 0; > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times "Merging successful" 2 > >> "store-merging1" } } */ > >> > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)