On Thu, 4 Nov 2021, Jakub Jelinek wrote: > Hi! > > When users don't use constexpr everywhere in initialization of namespace > scope non-comdat vars and the initializers aren't constant when FE is > looking at them, the FE performs dynamic initialization of those variables. > But after inlining and some constant propagation, we often end up with > just storing constants into those variables in the _GLOBAL__sub_I_* > constructor. > C++ gives us permission to change some of that dynamic initialization > back into static initialization - https://eel.is/c++draft/basic.start.static#3 > For classes that need (dynamic) construction, I believe access to some var > from other dynamic construction before that var is constructed is UB, but > as the example in the above mentioned spot of C++: > inline double fd() { return 1.0; } > extern double d1; > double d2 = d1; // unspecified: > // either statically initialized to 0.0 or > // dynamically initialized to 0.0 if d1 is > // dynamically initialized, or 1.0 otherwise > double d1 = fd(); // either initialized statically or dynamically to 1.0 > some vars can be used before they are dynamically initialized and the > implementation can still optimize those into static initialization. > > The following patch attempts to optimize some such cases back into > DECL_INITIAL initializers and where possible (originally const vars without > mutable members) put those vars back to .rodata etc. > > Because we put all dynamic initialization from a single TU into one single > function (well, originally one function per priority but typically inline > those back into one function), we can either have a simpler approach > (from the PR it seems that is what LLVM uses) where either we manage to > optimize all dynamic initializers into constant in the TU, or nothing, > or by adding some markup - in the form of a pair of internal functions in > this patch - around each dynamic initialization that can be optimized, > we can optimize each dynamic initialization separately. > > The patch adds a new pass that is invoked (through gate check) only on > DECL_ARTIFICIAL DECL_STATIC_CONSTRUCTOR functions, and looks there for > sequences like: > .DYNAMIC_INIT_START (&b, 0); > b = 1; > .DYNAMIC_INIT_END (&b); > or > .DYNAMIC_INIT_START (&e, 1); > # DEBUG this => &e.f > MEM[(struct S *)&e + 4B] ={v} {CLOBBER}; > MEM[(struct S *)&e + 4B].a = 1; > MEM[(struct S *)&e + 4B].b = 2; > MEM[(struct S *)&e + 4B].c = 3; > # DEBUG BEGIN_STMT > MEM[(struct S *)&e + 4B].d = 6; > # DEBUG this => NULL > .DYNAMIC_INIT_END (&e); > (where between the pair of markers everything is either debug stmts or > stores of constants into the variables or their parts). > The pass needs to be done late enough so that after IPA all the needed > constant propagation and perhaps loop unrolling is done, on the other > side should be early enough so that if we can't optimize it, we can > remove those .DYNAMIC_INIT* internal calls that could prevent some > further optimizations (they have fnspec such that they pretend to read > the corresponding variable). > > Currently the optimization is only able to optimize cases where the whole > variable is stored in a single store (typically scalar variables), or > uses the native_{encode,interpret}* infrastructure to create or update > the CONSTRUCTOR. This means that except for the first category, we can't > right now handle unions or anything that needs relocations (vars containing > pointers to other vars or references). > I think it would be nice to incrementally add before the native_* fallback > some attempt to just create or update a CONSTRUCTOR if possible. If we only > see var.a.b.c.d[10].e = const; style of stores, this shouldn't be that hard > as the whole access path is recorded there and we'd just need to decide what > to do with unions if two or more union members are accessed. And do a deep > copy of the CONSTRUCTOR and try to efficiently update the copy afterwards > (the CONSTRUCTORs should be sorted on increasing offsets of the > members/elements, so doing an ordered vec insertion might not be the best > idea). But MEM_REFs complicate this, parts or all of the access path > is lost. For non-unions in most cases we could try to guess which field > it is (do we have some existing function to do that? I vaguely remember > we've been doing that in some cases in the past in some folding but stopped > doing so) but with unions it will be harder or impossible. > > As the middle-end can't easily differentiate between const variables without > and with mutable members, both of those will have TREE_READONLY on the > var decl clear (because of dynamic initialization) and TYPE_READONLY set > on the type, the patch remembers this in an extra argument to > .DYNAMIC_INIT_START (true if it is ok to set TREE_READONLY on the var decl > back if the var dynamic initialization could be optimized into DECL_INITIAL). > Thinking more about it, I'm not sure about const vars without mutable > members with non-trivial destructors, do we register their dtors dynamically > through __cxa_atexit in the ctors (that would mean the optimization > currently punts on them), or not (in that case we could put it into .rodata > even when the dtor will want to perhaps write to them)? > > Anyway, forgot to do another set of bootstraps with gathering statistics how > many vars were optimized, so just trying to figure it out from the sizes of > _GLOBAL__sub_I_* functions: > > # Without patch, x86_64-linux cc1plus > $ readelf -Ws obj50/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk > 'BEGIN{I=0}{I=I+$3}END{print I}' > 13934 > # With the patch, x86_64-linux cc1plus > $ readelf -Ws obj52/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk > 'BEGIN{I=0}{I=I+$3}END{print I}' > 6966 > # Without patch, i686-linux cc1plus > $ readelf -Ws obj51/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk > 'BEGIN{I=0}{I=I+$3}END{print I}' > 24158 > # With the patch, i686-linux cc1plus > $ readelf -Ws obj53/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk > 'BEGIN{I=0}{I=I+$3}END{print I}' > 10536 > > That seems like a huge improvement, although on a closer look, most of that > saving is from just one TU: > $ readelf -Ws obj50/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk > '{print $3}' > 6693 > $ readelf -Ws obj52/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk > '{print $3}' > 1 > $ readelf -Ws obj51/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk > '{print $3}' > 13001 > $ readelf -Ws obj53/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk > '{print $3}' > 1 > So, the shrinking on all the dynamic initialization functions except > i386-options.o is: > 7241 -> 6965 for 64-bit and > 11157 -> 10535 for 32-bit. > Will try to use constexpr for i386-options.c later today. > > Another optimization that could be useful but not sure if it can be easily > done is if we before expansion of the _GLOBAL__sub_I_* functions end up with > nothing in their body (that's those 1 byte functions on x86) perhaps either > not emit those functions at all or at least don't register them in > .init_array etc. so that cycles aren't wasted at runtime: > $ readelf -Ws obj50/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == > 1){print $3}' | wc -l > 4 > $ readelf -Ws obj52/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == > 1){print $3}' | wc -l > 87 > $ readelf -Ws obj51/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == > 1){print $3}' | wc -l > 4 > $ readelf -Ws obj53/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == > 1){print $3}' | wc -l > 84 > > Also, wonder if I should add some new -f* option to control the optimization > or doing it always at -O+ with -fdisable-tree-pass-dyninit as a way to > disable it is good enough, and whether the 1024 hardcoded constant > (upper bound on optimized size so that we don't spend huge amounts of > compile time trying to optimize initializers of gigabyte sizes) shouldn't be > a param. > > Bootstrapped/regtested on x86_64-linux and i686-linux.
As a general comment I wonder whether doing this fully in the C++ frontend leveraging the constexpr support is a better approach, esp. before we end up putting all initializers into a single function ... even partly constexpr evaluating things might help in some case. On that note it might be worth experimenting with keeping each initializer in a separate function until IPA where IPA could then figure out dependences via IPA REFs (with LTO on the whole program), a) diagnosing inter-CU undefined behavior, b) "fixing" things by making sure the initialization happens init-before-use (when there's no cycle), c) with local analysis do the promotion to READONLY at IPA time and elide the function. I think most PRs really ask for more optimistic constexpr evaluation on the frontend side. Richard. > 2021-11-04 Jakub Jelinek <ja...@redhat.com> > > PR c++/102876 > gcc/ > * internal-fn.def (DYNAMIC_INIT_START, DYNAMIC_INIT_END): New internal > functions. > * internal-fn.c (expand_DYNAMIC_INIT_START, expand_DYNAMIC_INIT_END): > New functions. > * tree-pass.h (make_pass_dyninit): Declare. > * passes.def (pass_dyninit): Add after dce4. > * gimple-ssa-store-merging.c (pass_data_dyninit): New variable. > (class pass_dyninit): New type. > (pass_dyninit::execute): New method. > (make_pass_dyninit): New function. > gcc/cp/ > * decl2.c (one_static_initialization_or_destruction): Emit > .DYNAMIC_INIT_START and .DYNAMIC_INIT_END internal calls around > dynamic initialization of variables that don't need a guard. > gcc/testsuite/ > * g++.dg/opt/init3.C: New test. > > --- gcc/internal-fn.def.jj 2021-11-02 09:05:47.029664211 +0100 > +++ gcc/internal-fn.def 2021-11-02 12:40:38.702436113 +0100 > @@ -367,6 +367,10 @@ DEF_INTERNAL_FN (PHI, 0, NULL) > automatic variable. */ > DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) > > +/* Mark start and end of dynamic initialization of a variable. */ > +DEF_INTERNAL_FN (DYNAMIC_INIT_START, ECF_LEAF | ECF_NOTHROW, ". r ") > +DEF_INTERNAL_FN (DYNAMIC_INIT_END, ECF_LEAF | ECF_NOTHROW, ". r ") > + > /* DIM_SIZE and DIM_POS return the size of a particular compute > dimension and the executing thread's position within that > dimension. DIM_POS is pure (and not const) so that it isn't > --- gcc/internal-fn.c.jj 2021-11-02 09:05:47.029664211 +0100 > +++ gcc/internal-fn.c 2021-11-02 12:40:38.703436099 +0100 > @@ -3485,6 +3485,16 @@ expand_CO_ACTOR (internal_fn, gcall *) > gcc_unreachable (); > } > > +static void > +expand_DYNAMIC_INIT_START (internal_fn, gcall *) > +{ > +} > + > +static void > +expand_DYNAMIC_INIT_END (internal_fn, gcall *) > +{ > +} > + > /* Expand a call to FN using the operands in STMT. FN has a single > output operand and NARGS input operands. */ > > --- gcc/tree-pass.h.jj 2021-10-28 11:29:01.891721153 +0200 > +++ gcc/tree-pass.h 2021-11-02 14:15:00.139185088 +0100 > @@ -445,6 +445,7 @@ extern gimple_opt_pass *make_pass_cse_re > extern gimple_opt_pass *make_pass_cse_sincos (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt); > +extern gimple_opt_pass *make_pass_dyninit (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_warn_function_return (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context > *ctxt); > --- gcc/passes.def.jj 2021-11-01 14:37:06.685853324 +0100 > +++ gcc/passes.def 2021-11-02 14:23:47.836715821 +0100 > @@ -261,6 +261,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_tsan); > NEXT_PASS (pass_dse); > NEXT_PASS (pass_dce); > + NEXT_PASS (pass_dyninit); > /* Pass group that runs when 1) enabled, 2) there are loops > in the function. Make sure to run pass_fix_loops before > to discover/remove loops before running the gate function > --- gcc/gimple-ssa-store-merging.c.jj 2021-09-01 12:06:19.488211919 +0200 > +++ gcc/gimple-ssa-store-merging.c 2021-11-03 18:02:55.190015359 +0100 > @@ -170,6 +170,8 @@ > #include "optabs-tree.h" > #include "dbgcnt.h" > #include "selftest.h" > +#include "cgraph.h" > +#include "varasm.h" > > /* The maximum size (in bits) of the stores this pass should generate. */ > #define MAX_STORE_BITSIZE (BITS_PER_WORD) > @@ -5465,6 +5467,334 @@ pass_store_merging::execute (function *f > return 0; > } > > +/* Pass to optimize C++ dynamic initialization. */ > + > +const pass_data pass_data_dyninit = { > + GIMPLE_PASS, /* type */ > + "dyninit", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_GIMPLE_STORE_MERGING, /* tv_id */ > + PROP_ssa, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + 0, /* todo_flags_finish */ > +}; > + > +class pass_dyninit : public gimple_opt_pass > +{ > +public: > + pass_dyninit (gcc::context *ctxt) > + : gimple_opt_pass (pass_data_dyninit, ctxt) > + { > + } > + > + virtual bool > + gate (function *fun) > + { > + return (DECL_ARTIFICIAL (fun->decl) > + && DECL_STATIC_CONSTRUCTOR (fun->decl) > + && optimize); > + } > + > + virtual unsigned int execute (function *); > +}; // class pass_dyninit > + > +unsigned int > +pass_dyninit::execute (function *fun) > +{ > + basic_block bb; > + auto_vec<gimple *, 32> ifns; > + hash_map<tree, gimple *> *map = NULL; > + auto_vec<tree, 32> vars; > + gimple **cur = NULL; > + bool ssdf_calls = false; > + > + FOR_EACH_BB_FN (bb, fun) > + { > + for (gimple_stmt_iterator gsi = gsi_after_labels (bb); > + !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple *stmt = gsi_stmt (gsi); > + if (is_gimple_debug (stmt)) > + continue; > + > + /* The C++ FE can wrap dynamic initialization of certain > + variables with a pair of iternal function calls, like: > + .DYNAMIC_INIT_START (&b, 0); > + b = 1; > + .DYNAMIC_INIT_END (&b); > + > + or > + .DYNAMIC_INIT_START (&e, 1); > + # DEBUG this => &e.f > + MEM[(struct S *)&e + 4B] ={v} {CLOBBER}; > + MEM[(struct S *)&e + 4B].a = 1; > + MEM[(struct S *)&e + 4B].b = 2; > + MEM[(struct S *)&e + 4B].c = 3; > + # DEBUG BEGIN_STMT > + MEM[(struct S *)&e + 4B].d = 6; > + # DEBUG this => NULL > + .DYNAMIC_INIT_END (&e); > + > + Verify if there are only stores of constants to the corresponding > + variable or parts of that variable and if so, try to reconstruct > + a static initializer from the static initializer if any and > + the constant stores into the variable. This is permitted by > + [basic.start.static]/3. */ > + if (is_gimple_call (stmt)) > + { > + if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_START)) > + { > + ifns.safe_push (stmt); > + if (cur) > + *cur = NULL; > + tree arg = gimple_call_arg (stmt, 0); > + gcc_assert (TREE_CODE (arg) == ADDR_EXPR > + && DECL_P (TREE_OPERAND (arg, 0))); > + tree var = TREE_OPERAND (arg, 0); > + gcc_checking_assert (is_global_var (var)); > + varpool_node *node = varpool_node::get (var); > + if (node == NULL > + || node->in_other_partition > + || TREE_ASM_WRITTEN (var) > + || DECL_SIZE_UNIT (var) == NULL_TREE > + || !tree_fits_uhwi_p (DECL_SIZE_UNIT (var)) > + || tree_to_uhwi (DECL_SIZE_UNIT (var)) > 1024 > + || TYPE_SIZE_UNIT (TREE_TYPE (var)) == NULL_TREE > + || !tree_int_cst_equal (TYPE_SIZE_UNIT (TREE_TYPE (var)), > + DECL_SIZE_UNIT (var))) > + continue; > + if (map == NULL) > + map = new hash_map<tree, gimple *> (61); > + bool existed_p; > + cur = &map->get_or_insert (var, &existed_p); > + if (existed_p) > + { > + /* Punt if we see more than one .DYNAMIC_INIT_START > + internal call for the same variable. */ > + *cur = NULL; > + cur = NULL; > + } > + else > + { > + *cur = stmt; > + vars.safe_push (var); > + } > + continue; > + } > + else if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_END)) > + { > + ifns.safe_push (stmt); > + tree arg = gimple_call_arg (stmt, 0); > + gcc_assert (TREE_CODE (arg) == ADDR_EXPR > + && DECL_P (TREE_OPERAND (arg, 0))); > + tree var = TREE_OPERAND (arg, 0); > + gcc_checking_assert (is_global_var (var)); > + if (cur) > + { > + /* Punt if .DYNAMIC_INIT_END call argument doesn't > + pair with .DYNAMIC_INIT_START. */ > + if (vars.last () != var) > + *cur = NULL; > + cur = NULL; > + } > + continue; > + } > + > + /* Punt if we see any artificial > + __static_initialization_and_destruction_* calls, e.g. if > + it would be partially inlined, because we wouldn't then see > + all .DYNAMIC_INIT_* calls. */ > + tree fndecl = gimple_call_fndecl (stmt); > + if (fndecl > + && DECL_ARTIFICIAL (fndecl) > + && DECL_NAME (fndecl) > + && startswith (IDENTIFIER_POINTER (DECL_NAME (fndecl)), > + "__static_initialization_and_destruction_")) > + ssdf_calls = true; > + } > + if (cur) > + { > + if (store_valid_for_store_merging_p (stmt)) > + { > + tree lhs = gimple_assign_lhs (stmt); > + tree rhs = gimple_assign_rhs1 (stmt); > + poly_int64 bitsize, bitpos; > + HOST_WIDE_INT ibitsize, ibitpos; > + machine_mode mode; > + int unsignedp, reversep, volatilep = 0; > + tree offset; > + tree var = vars.last (); > + if (rhs_valid_for_store_merging_p (rhs) > + && get_inner_reference (lhs, &bitsize, &bitpos, &offset, > + &mode, &unsignedp, &reversep, > + &volatilep) == var > + && !reversep > + && !volatilep > + && (offset == NULL_TREE || integer_zerop (offset)) > + && bitsize.is_constant (&ibitsize) > + && bitpos.is_constant (&ibitpos) > + && ibitpos >= 0 > + && ibitsize <= tree_to_shwi (DECL_SIZE (var)) > + && ibitsize + ibitpos <= tree_to_shwi (DECL_SIZE (var))) > + continue; > + } > + *cur = NULL; > + cur = NULL; > + } > + } > + if (cur) > + { > + *cur = NULL; > + cur = NULL; > + } > + } > + if (map && !ssdf_calls) > + { > + for (tree var : vars) > + { > + gimple *g = *map->get (var); > + if (g == NULL) > + continue; > + varpool_node *node = varpool_node::get (var); > + node->get_constructor (); > + tree init = DECL_INITIAL (var); > + if (init == NULL) > + init = build_zero_cst (TREE_TYPE (var)); > + gimple_stmt_iterator gsi = gsi_for_stmt (g); > + unsigned char *buf = NULL; > + unsigned int buf_size = tree_to_uhwi (DECL_SIZE_UNIT (var)); > + bool buf_valid = false; > + do > + { > + gsi_next (&gsi); > + gimple *stmt = gsi_stmt (gsi); > + if (is_gimple_debug (stmt)) > + continue; > + if (is_gimple_call (stmt)) > + break; > + if (gimple_clobber_p (stmt)) > + continue; > + tree lhs = gimple_assign_lhs (stmt); > + tree rhs = gimple_assign_rhs1 (stmt); > + if (lhs == var) > + { > + /* Simple assignment to the whole variable. > + rhs is the initializer. */ > + buf_valid = false; > + init = rhs; > + continue; > + } > + poly_int64 bitsize, bitpos; > + machine_mode mode; > + int unsignedp, reversep, volatilep = 0; > + tree offset; > + get_inner_reference (lhs, &bitsize, &bitpos, &offset, > + &mode, &unsignedp, &reversep, &volatilep); > + HOST_WIDE_INT ibitsize = bitsize.to_constant (); > + HOST_WIDE_INT ibitpos = bitpos.to_constant (); > + if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN > + || CHAR_BIT != 8 > + || BITS_PER_UNIT != 8) > + { > + g = NULL; > + break; > + } > + if (!buf_valid) > + { > + if (buf == NULL) > + buf = XNEWVEC (unsigned char, buf_size * 2); > + memset (buf, 0, buf_size); > + if (native_encode_initializer (init, buf, buf_size) > + != (int) buf_size) > + { > + g = NULL; > + break; > + } > + buf_valid = true; > + } > + /* Otherwise go through byte representation. */ > + if (!encode_tree_to_bitpos (rhs, buf, ibitsize, > + ibitpos, buf_size)) > + { > + g = NULL; > + break; > + } > + } > + while (1); > + if (g == NULL) > + { > + XDELETE (buf); > + continue; > + } > + if (buf_valid) > + { > + init = native_interpret_aggregate (TREE_TYPE (var), buf, 0, > + buf_size); > + if (init) > + { > + /* Verify the dynamic initialization doesn't e.g. set > + some padding bits to non-zero by trying to encode > + it again and comparing. */ > + memset (buf + buf_size, 0, buf_size); > + if (native_encode_initializer (init, buf + buf_size, > + buf_size) != (int) buf_size > + || memcmp (buf, buf + buf_size, buf_size) != 0) > + init = NULL_TREE; > + } > + } > + XDELETE (buf); > + if (!init || !initializer_constant_valid_p (init, TREE_TYPE (var))) > + continue; > + if (integer_nonzerop (gimple_call_arg (g, 1))) > + TREE_READONLY (var) = 1; > + if (dump_file) > + { > + fprintf (dump_file, "dynamic initialization of "); > + print_generic_stmt (dump_file, var, TDF_SLIM); > + fprintf (dump_file, " optimized into: "); > + print_generic_stmt (dump_file, init, TDF_SLIM); > + if (TREE_READONLY (var)) > + fprintf (dump_file, " and making it read-only\n"); > + fprintf (dump_file, "\n"); > + } > + if (initializer_zerop (init)) > + DECL_INITIAL (var) = NULL_TREE; > + else > + DECL_INITIAL (var) = init; > + gsi = gsi_for_stmt (g); > + gsi_next (&gsi); > + do > + { > + gimple *stmt = gsi_stmt (gsi); > + if (is_gimple_debug (stmt)) > + { > + gsi_next (&gsi); > + continue; > + } > + if (is_gimple_call (stmt)) > + break; > + /* Remove now all the stores for the dynamic initialization. */ > + unlink_stmt_vdef (stmt); > + gsi_remove (&gsi, true); > + if (gimple_vdef (stmt)) > + release_ssa_name (gimple_vdef (stmt)); > + } > + while (1); > + } > + } > + delete map; > + for (gimple *g : ifns) > + { > + gimple_stmt_iterator gsi = gsi_for_stmt (g); > + unlink_stmt_vdef (g); > + gsi_remove (&gsi, true); > + if (gimple_vdef (g)) > + release_ssa_name (gimple_vdef (g)); > + } > + return 0; > +} > } // anon namespace > > /* Construct and return a store merging pass object. */ > @@ -5475,6 +5805,14 @@ make_pass_store_merging (gcc::context *c > return new pass_store_merging (ctxt); > } > > +/* Construct and return a dyninit pass object. */ > + > +gimple_opt_pass * > +make_pass_dyninit (gcc::context *ctxt) > +{ > + return new pass_dyninit (ctxt); > +} > + > #if CHECKING_P > > namespace selftest { > --- gcc/cp/decl2.c.jj 2021-11-02 09:05:47.004664566 +0100 > +++ gcc/cp/decl2.c 2021-11-03 17:18:11.395288518 +0100 > @@ -4133,13 +4133,36 @@ one_static_initialization_or_destruction > { > if (init) > { > + bool sanitize = sanitize_flags_p (SANITIZE_ADDRESS, decl); > + if (optimize && guard == NULL_TREE && !sanitize) > + { > + tree t = build_fold_addr_expr (decl); > + tree type = TREE_TYPE (decl); > + tree is_const > + = constant_boolean_node (TYPE_READONLY (type) > + && !cp_has_mutable_p (type), > + boolean_type_node); > + t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl), > + IFN_DYNAMIC_INIT_START, > + void_type_node, 2, t, > + is_const); > + finish_expr_stmt (t); > + } > finish_expr_stmt (init); > - if (sanitize_flags_p (SANITIZE_ADDRESS, decl)) > + if (sanitize) > { > varpool_node *vnode = varpool_node::get (decl); > if (vnode) > vnode->dynamically_initialized = 1; > } > + else if (optimize && guard == NULL_TREE) > + { > + tree t = build_fold_addr_expr (decl); > + t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl), > + IFN_DYNAMIC_INIT_END, > + void_type_node, 1, t); > + finish_expr_stmt (t); > + } > } > > /* If we're using __cxa_atexit, register a function that calls the > --- gcc/testsuite/g++.dg/opt/init3.C.jj 2021-11-03 17:53:01.872472570 > +0100 > +++ gcc/testsuite/g++.dg/opt/init3.C 2021-11-03 17:52:57.484535115 +0100 > @@ -0,0 +1,31 @@ > +// PR c++/102876 > +// { dg-do compile } > +// { dg-options "-O2 -fdump-tree-dyninit" } > +// { dg-final { scan-tree-dump "dynamic initialization of b\[\n\r]* > optimized into: 1" "dyninit" } } > +// { dg-final { scan-tree-dump "dynamic initialization of e\[\n\r]* > optimized into: {.e=5, .f={.a=1, .b=2, .c=3, .d=6}, .g=6}\[\n\r]* and making > it read-only" "dyninit" } } > +// { dg-final { scan-tree-dump "dynamic initialization of f\[\n\r]* > optimized into: {.e=7, .f={.a=1, .b=2, .c=3, .d=6}, .g=1}" "dyninit" } } > +// { dg-final { scan-tree-dump "dynamic initialization of h\[\n\r]* > optimized into: {.h=8, .i={.a=1, .b=2, .c=3, .d=6}, .j=9}" "dyninit" } } > +// { dg-final { scan-tree-dump-times "dynamic initialization of " 4 > "dyninit" } } > +// { dg-final { scan-tree-dump-times "and making it read-only" 1 "dyninit" } > } > + > +struct S { S () : a(1), b(2), c(3), d(4) { d += 2; } int a, b, c, d; }; > +struct T { int e; S f; int g; }; > +struct U { int h; mutable S i; int j; }; > +extern int b; > +int foo (int &); > +int bar (int &); > +int baz () { return 1; } > +int qux () { return b = 2; } > +// Dynamic initialization of a shouldn't be optimized, foo can't be inlined. > +int a = foo (b); > +int b = baz (); > +// Likewise for c. > +int c = bar (b); > +// While qux is inlined, the dynamic initialization modifies another > +// variable, so punt for d as well. > +int d = qux (); > +const T e = { 5, S (), 6 }; > +T f = { 7, S (), baz () }; > +const T &g = e; > +const U h = { 8, S (), 9 }; > +const U &i = h; > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)