On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <l...@redhat.com> wrote: > This is the second patch in the kit to improve our DSE implementation. > > This patch recognizes when a CONSTRUCTOR assignment could be trimmed at the > head or tail because those bytes are dead. > > The first implementation of this turned the CONSTRUCTOR into a memset. This > version actually rewrites the RHS and LHS of the CONSTRUCTOR assignment. > > You'll note that the implementation computes head and tail trim counts, then > masks them to an even byte count. We might even consider masking off the > two low bits in the counts. This masking keeps higher alignments on the > CONSTRUCTOR remnant which helps keep things efficient when the CONSTRUCTOR > results in a memset call. > > This patch hits a lot statically in GCC and the testsuite. There were > hundreds of hits in each. > > There may be some room for tuning. Trimming shouldn't ever result in poorer > performance, but it may also not result in any measurable gain (it depends > on how much gets trimmed relative to the size of the CONSTRUCTOR node and > how the CONSTRUCTOR node gets expanded, the processor's capabilities for > merging stores internally, etc etc). I suspect the main benefit comes when > the CONSTRUCTOR collapses down to some thing small that gets expanded > inline, thus exposing the internals to the rest of the optimization > pipeline. > > We could, in theory, split the CONSTRUCTOR to pick up dead bytes in the > middle of the CONSTRUCTOR. I haven't looked to see how applicable that is > in real code and what the cost/benefit analysis might look like. > > > Bootstrapped and regression tested on x86_64-linux-gnu. OK for the trunk?
+ categorize_ctor_elements (ctor, &nz_elts, &init_elts, &complete); + + /* This is the only case we currently handle. It actually seems to + catch most cases of actual interest. */ + if (init_elts == 0 && !CONSTRUCTOR_NO_CLEARING (ctor)) actually the _only_ case allowed as RHS of stores is an empty CONSTRUCTOR and thus clearing of the object. Thus gcc_assert (CONSTRUCTOR_NELTS (ctor) == 0); should work here. + /* And the new type for the CONSTRUCTOR. Essentially it's just + a char array large enough to cover the non-trimmed parts of + the original CONSTRUCTOR. Note we want explicit bounds here + so that we know how many bytes to clear when expanding the + CONSTRUCTOR. */ + tree type = build_range_type (sizetype, size_zero_node, + build_int_cst (sizetype, count)); + type = build_array_type (char_type_node, type); there is build_array_type_nelts combining both (note nelts rather than upper bound). + /* Build a MEM_REF representing the whole accessed area, starting + at the first byte not trimmed. */ + tree exp = fold_build2 (MEM_REF, type, lhs_addr, + build_int_cst (build_pointer_type (type), + head_trim)); this doesn't preserve alias info but pessimistically uses alias set zero. I believe you can use tree alias_type = reference_alias_ptr_type (gimple_assign_lhs (stmt)); ..., build_int_cst (alias_type, head_trim) to do better. Note that you end up generating invalid gimple (not sure what you are doing wiht the tree-sra stuff in this patch - it seems to be unused). Consider a[i] = {}; where trimming generates MEM[&a[i], head_trim] = {}; that is invalid GIMPLE. I guess we never run into this case because for the LHS max_size != size during analysis? To be on the safe side please add a if (! is_gimple_min_invariant (lhs_addr)) return; after computing lhs_addr. Sth to keep in mind is that nothing on GIMPLE changes the store from a CONSTRUCTOR to a store from literal zero. So even if you end up with a int-size and int-aligned MEM <char[4]> [&x, 12] = {}; nothing (but RTL expansion) rewrites it to MEM <int> [&x, 12] = 0; that's something missing from gimple-fold.c (some code can be shared with MEMSET folding I guess). Otherwise the patch looks ok to me. Thanks, Richard. > PR tree-optimization/61912 > PR tree-optimization/77485 > * tree-sra.h: New file. > * ipa-cp.c: Include tree-sra.h > * ipa-prop.h (build_ref_for_offset): Remove prototype. > * tree-ssa-dse.c: Include expr.h and tree-sra.h. > (compute_trims, trim_constructor_store): New functions. > (maybe_trim_partially_dead_store): Call trim_constructor_store. > > > * g++.dg/tree-ssa/ssa-dse-1.C: New test. > * gcc.dg/tree-ssa/pr30375: Adjust expected output. > * gcc.dg/tree-ssa/ssa-dse-24.c: New test. > > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index d3b5052..bc5ea87 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -122,6 +122,7 @@ along with GCC; see the file COPYING3. If not see > #include "ipa-inline.h" > #include "ipa-utils.h" > #include "tree-ssa-ccp.h" > +#include "tree-sra.h" > > template <typename valtype> class ipcp_value; > > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h > index 0e75cf4..6d7b480 100644 > --- a/gcc/ipa-prop.h > +++ b/gcc/ipa-prop.h > @@ -820,10 +820,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree > **, bool *, > void ipa_release_body_info (struct ipa_func_body_info *); > tree ipa_get_callee_param_type (struct cgraph_edge *e, int i); > > -/* From tree-sra.c: */ > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, > - gimple_stmt_iterator *, bool); > - > /* In ipa-cp.c */ > void ipa_cp_c_finalize (void); > > diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C > b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C > new file mode 100644 > index 0000000..3f85f3a > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C > @@ -0,0 +1,101 @@ > +/* { dg-do compile } */ > +/* { dg-options "-std=c++14 -O -fdump-tree-dse1-details" } */ > + > +using uint = unsigned int; > + > +template<typename C, uint S> > +struct FixBuf > +{ > + C buf[S] = {}; > +}; > + > +template<typename C> > +struct OutBuf > +{ > + C* cur; > + C* end; > + C* beg; > + > + template<uint S> > + constexpr > + OutBuf(FixBuf<C, S>& b) : cur{b.buf}, end{b.buf + S}, beg{b.buf} { } > + > + OutBuf(C* b, C* e) : cur{b}, end{e} { } > + OutBuf(C* b, uint s) : cur{b}, end{b + s} { } > + > + constexpr > + OutBuf& operator<<(C v) > + { > + if (cur < end) { > + *cur = v; > + } > + ++cur; > + return *this; > + } > + > + constexpr > + OutBuf& operator<<(uint v) > + { > + uint q = v / 10U; > + uint r = v % 10U; > + if (q) { > + *this << q; > + } > + *this << static_cast<C>(r + '0'); > + return *this; > + } > +}; > + > +template<bool BOS> > +struct BufOrSize > +{ > + template<typename C, uint S> > + static constexpr auto Select(FixBuf<C, S>& fb, OutBuf<C>&) > + { > + return fb; > + } > +}; > + > +template<> > +struct BufOrSize<true> > +{ > + template<typename C, uint S> > + static constexpr auto Select(FixBuf<C, S>&, OutBuf<C>& ob) > + { > + return ob.cur - ob.beg; > + } > +}; > + > +// if BOS=1, it will return the size of the generated data, else the data > itself > +template<uint N, uint S, bool BOS = 0> > +constexpr > +auto fixbuf() > +{ > + FixBuf<char, S> fb; > + OutBuf<char> ob{fb}; > + for (uint i = 0; i <= N; ++i) { > + ob << i << static_cast<char>(i == N ? 0 : ' '); > + } > + return BufOrSize<BOS>::Select(fb, ob); > +} > + > +auto foo() > +{ > + constexpr auto x = fixbuf<13, 200>(); > + return x; > +} > + > +auto foo_sized() > +{ > + constexpr auto s = fixbuf<13, 0, 1>(); > + constexpr auto x = fixbuf<13, s>(); > + return x; > +} > + > +int main() > +{ > +} > + > + > +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\] > \\*\\)&<retval> \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c > index 0439b1c..cf0a93b 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c > @@ -22,4 +22,5 @@ void test_signed_msg_encoding(void) > f(); > } > > -/* { dg-final { scan-tree-dump-times "signInfo = {}" 1 "dse1" } } */ > +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\] > \\*\\)&signInfo \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c > b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c > new file mode 100644 > index 0000000..80a35ca > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c > @@ -0,0 +1,62 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-dse1" } */ > + > + > +typedef unsigned int wchar_t; > +struct printf_info > +{ > + int prec; > + int width; > + wchar_t spec; > + unsigned int is_long_double:1; > + unsigned int is_short:1; > + unsigned int is_long:1; > + unsigned int alt:1; > + unsigned int space:1; > + unsigned int left:1; > + unsigned int showsign:1; > + unsigned int group:1; > + unsigned int extra:1; > + unsigned int is_char:1; > + unsigned int wide:1; > + unsigned int i18n:1; > + unsigned int __pad:4; > + unsigned short int user; > + wchar_t pad; > +} info; > + > +void bar (struct printf_info *); > + > +void foo(int prec, > + int width, > + wchar_t spec, > + unsigned int is_long_double, > + unsigned int is_short, > + unsigned int is_long, > + unsigned int alt, > + unsigned int space, > + unsigned int left, > + unsigned int showsign, > + unsigned int group, > + wchar_t pad) > +{ > + struct printf_info info = { > + .prec = prec, > + .width = width, > + .spec = spec, > + .is_long_double = is_long_double, > + .is_short = is_short, > + .is_long = is_long, > + .alt = alt, > + .space = space, > + .left = left, > + .showsign = showsign, > + .group = group, > + .pad = pad, > + .extra = 0, > + .wide = sizeof (char) != 1 }; > + > + bar (&info); > +} > + > +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(char\\\[\[0-9\]+\\\] > \\*\\)&info \\+ \[0-9\]+B\\\] = {}" 1 "dse1" } } */ > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h > new file mode 100644 > index 0000000..6d88ece > --- /dev/null > +++ b/gcc/tree-sra.h > @@ -0,0 +1,27 @@ > +/* Header file for SRA helper functions > + Copyright (C) 2016 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > + for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#ifndef GCC_TREE_SRA_H > +#define GCC_TREE_SRA_H > + > +tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, > + gimple_stmt_iterator *, bool); > + > + > +#endif /* GCC_TREE_SRA_H */ > diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c > index 40acd83..4757702 100644 > --- a/gcc/tree-ssa-dse.c > +++ b/gcc/tree-ssa-dse.c > @@ -34,6 +34,8 @@ along with GCC; see the file COPYING3. If not see > #include "domwalk.h" > #include "tree-cfgcleanup.h" > #include "params.h" > +#include "expr.h" > +#include "tree-sra.h" > > /* This file implements dead store elimination. > > @@ -272,6 +274,68 @@ maybe_trim_complex_store (ao_ref *ref, sbitmap live, > gimple *stmt) > are live. We do not try to optimize those cases. */ > } > > +/* STMT initializes an object using a CONSTRUCTOR where one or more of the > + bytes written are dead stores. ORIG is the bitmap of bytes stored by > + STMT. LIVE is the bitmap of stores that are actually live. > + > + Attempt to rewrite STMT so that only the real or imaginary part of > + the object is actually stored. > + > + The most common case for getting here is a CONSTRUCTOR with no elements > + being used to zero initialize an object. We do not try to handle other > + cases as those would force us to fully cover the object with the > + CONSTRUCTOR node except for the components that are dead. */ > + > +static void > +trim_constructor_store (ao_ref *ref, sbitmap live, gimple *stmt) > +{ > + tree ctor = gimple_assign_rhs1 (stmt); > + > + HOST_WIDE_INT nz_elts, init_elts; > + bool complete; > + categorize_ctor_elements (ctor, &nz_elts, &init_elts, &complete); > + > + /* This is the only case we currently handle. It actually seems to > + catch most cases of actual interest. */ > + if (init_elts == 0 && !CONSTRUCTOR_NO_CLEARING (ctor)) > + { > + int head_trim = 0; > + int tail_trim = 0; > + compute_trims (ref, live, &head_trim, &tail_trim); > + > + /* Now we want to replace the constructor initializer > + with memset (object + head_trim, 0, size - head_trim - tail_trim). > */ > + if (head_trim || tail_trim) > + { > + /* We want &lhs for the MEM_REF expression. */ > + tree lhs_addr = build_fold_addr_expr (gimple_assign_lhs (stmt)); > + > + /* The number of bytes for the new constructor. */ > + int count = (ref->size / BITS_PER_UNIT) - head_trim - tail_trim - > 1; > + > + /* And the new type for the CONSTRUCTOR. Essentially it's just > + a char array large enough to cover the non-trimmed parts of > + the original CONSTRUCTOR. Note we want explicit bounds here > + so that we know how many bytes to clear when expanding the > + CONSTRUCTOR. */ > + tree type = build_range_type (sizetype, size_zero_node, > + build_int_cst (sizetype, count)); > + type = build_array_type (char_type_node, type); > + > + > + /* Build a MEM_REF representing the whole accessed area, starting > + at the first byte not trimmed. */ > + tree exp = fold_build2 (MEM_REF, type, lhs_addr, > + build_int_cst (build_pointer_type (type), > + head_trim)); > + > + /* Now update STMT with a new RHS and LHS. */ > + gimple_assign_set_lhs (stmt, exp); > + gimple_assign_set_rhs1 (stmt, build_constructor (type, NULL)); > + } > + } > +} > + > /* STMT is a memory write where one or more bytes written are dead > stores. ORIG is the bitmap of bytes stored by STMT. LIVE is the > bitmap of stores that are actually live. > @@ -288,6 +352,9 @@ maybe_trim_partially_dead_store (ao_ref *ref, sbitmap > live, gimple *stmt) > { > switch (gimple_assign_rhs_code (stmt)) > { > + case CONSTRUCTOR: > + trim_constructor_store (ref, live, stmt); > + break; > case COMPLEX_CST: > maybe_trim_complex_store (ref, live, stmt); > break; >