On Thu, 4 Nov 2021, Jakub Jelinek wrote:

> Hi!
> 
> When users don't use constexpr everywhere in initialization of namespace
> scope non-comdat vars and the initializers aren't constant when FE is
> looking at them, the FE performs dynamic initialization of those variables.
> But after inlining and some constant propagation, we often end up with
> just storing constants into those variables in the _GLOBAL__sub_I_*
> constructor.
> C++ gives us permission to change some of that dynamic initialization
> back into static initialization - https://eel.is/c++draft/basic.start.static#3
> For classes that need (dynamic) construction, I believe access to some var
> from other dynamic construction before that var is constructed is UB, but
> as the example in the above mentioned spot of C++:
> inline double fd() { return 1.0; }
> extern double d1;
> double d2 = d1;     // unspecified:
>                     // either statically initialized to 0.0 or
>                     // dynamically initialized to 0.0 if d1 is
>                     // dynamically initialized, or 1.0 otherwise
> double d1 = fd();   // either initialized statically or dynamically to 1.0
> some vars can be used before they are dynamically initialized and the
> implementation can still optimize those into static initialization.
> 
> The following patch attempts to optimize some such cases back into
> DECL_INITIAL initializers and where possible (originally const vars without
> mutable members) put those vars back to .rodata etc.
> 
> Because we put all dynamic initialization from a single TU into one single
> function (well, originally one function per priority but typically inline
> those back into one function), we can either have a simpler approach
> (from the PR it seems that is what LLVM uses) where either we manage to
> optimize all dynamic initializers into constant in the TU, or nothing,
> or by adding some markup - in the form of a pair of internal functions in
> this patch - around each dynamic initialization that can be optimized,
> we can optimize each dynamic initialization separately.
> 
> The patch adds a new pass that is invoked (through gate check) only on
> DECL_ARTIFICIAL DECL_STATIC_CONSTRUCTOR functions, and looks there for
> sequences like:
>   .DYNAMIC_INIT_START (&b, 0);
>   b = 1;
>   .DYNAMIC_INIT_END (&b);
> or
>   .DYNAMIC_INIT_START (&e, 1);
>   # DEBUG this => &e.f
>   MEM[(struct S *)&e + 4B] ={v} {CLOBBER};
>   MEM[(struct S *)&e + 4B].a = 1;
>   MEM[(struct S *)&e + 4B].b = 2;
>   MEM[(struct S *)&e + 4B].c = 3;
>   # DEBUG BEGIN_STMT
>   MEM[(struct S *)&e + 4B].d = 6;
>   # DEBUG this => NULL
>   .DYNAMIC_INIT_END (&e);
> (where between the pair of markers everything is either debug stmts or
> stores of constants into the variables or their parts).
> The pass needs to be done late enough so that after IPA all the needed
> constant propagation and perhaps loop unrolling is done, on the other
> side should be early enough so that if we can't optimize it, we can
> remove those .DYNAMIC_INIT* internal calls that could prevent some
> further optimizations (they have fnspec such that they pretend to read
> the corresponding variable).
> 
> Currently the optimization is only able to optimize cases where the whole
> variable is stored in a single store (typically scalar variables), or
> uses the native_{encode,interpret}* infrastructure to create or update
> the CONSTRUCTOR.  This means that except for the first category, we can't
> right now handle unions or anything that needs relocations (vars containing
> pointers to other vars or references).
> I think it would be nice to incrementally add before the native_* fallback
> some attempt to just create or update a CONSTRUCTOR if possible.  If we only
> see var.a.b.c.d[10].e = const; style of stores, this shouldn't be that hard
> as the whole access path is recorded there and we'd just need to decide what
> to do with unions if two or more union members are accessed.  And do a deep
> copy of the CONSTRUCTOR and try to efficiently update the copy afterwards
> (the CONSTRUCTORs should be sorted on increasing offsets of the
> members/elements, so doing an ordered vec insertion might not be the best
> idea).  But MEM_REFs complicate this, parts or all of the access path
> is lost.  For non-unions in most cases we could try to guess which field
> it is (do we have some existing function to do that?  I vaguely remember
> we've been doing that in some cases in the past in some folding but stopped
> doing so) but with unions it will be harder or impossible.
> 
> As the middle-end can't easily differentiate between const variables without
> and with mutable members, both of those will have TREE_READONLY on the
> var decl clear (because of dynamic initialization) and TYPE_READONLY set
> on the type, the patch remembers this in an extra argument to
> .DYNAMIC_INIT_START (true if it is ok to set TREE_READONLY on the var decl
> back if the var dynamic initialization could be optimized into DECL_INITIAL).
> Thinking more about it, I'm not sure about const vars without mutable
> members with non-trivial destructors, do we register their dtors dynamically
> through __cxa_atexit in the ctors (that would mean the optimization
> currently punts on them), or not (in that case we could put it into .rodata
> even when the dtor will want to perhaps write to them)?
> 
> Anyway, forgot to do another set of bootstraps with gathering statistics how
> many vars were optimized, so just trying to figure it out from the sizes of
> _GLOBAL__sub_I_* functions:
> 
> # Without patch, x86_64-linux cc1plus
> $ readelf -Ws obj50/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 
> 'BEGIN{I=0}{I=I+$3}END{print I}'
> 13934
> # With the patch, x86_64-linux cc1plus
> $ readelf -Ws obj52/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 
> 'BEGIN{I=0}{I=I+$3}END{print I}'
> 6966
> # Without patch, i686-linux cc1plus
> $ readelf -Ws obj51/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 
> 'BEGIN{I=0}{I=I+$3}END{print I}'
> 24158
> # With the patch, i686-linux cc1plus
> $ readelf -Ws obj53/gcc/cc1plus | grep ' _GLOBAL__sub_I_' | awk 
> 'BEGIN{I=0}{I=I+$3}END{print I}'
> 10536
> 
> That seems like a huge improvement, although on a closer look, most of that
> saving is from just one TU:
> $ readelf -Ws obj50/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk 
> '{print $3}'
> 6693
> $ readelf -Ws obj52/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk 
> '{print $3}'
> 1
> $ readelf -Ws obj51/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk 
> '{print $3}'
> 13001
> $ readelf -Ws obj53/gcc/i386-options.o | grep ' _GLOBAL__sub_I_' | awk 
> '{print $3}'
> 1
> So, the shrinking on all the dynamic initialization functions except
> i386-options.o is:
> 7241 -> 6965 for 64-bit and
> 11157 -> 10535 for 32-bit.
> Will try to use constexpr for i386-options.c later today.
> 
> Another optimization that could be useful but not sure if it can be easily
> done is if we before expansion of the _GLOBAL__sub_I_* functions end up with
> nothing in their body (that's those 1 byte functions on x86) perhaps either
> not emit those functions at all or at least don't register them in
> .init_array etc. so that cycles aren't wasted at runtime:
> $ readelf -Ws obj50/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 
> 1){print $3}' | wc -l
> 4
> $ readelf -Ws obj52/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 
> 1){print $3}' | wc -l
> 87
> $ readelf -Ws obj51/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 
> 1){print $3}' | wc -l
> 4
> $ readelf -Ws obj53/gcc/{*,*/*}.o | grep ' _GLOBAL__sub_I_' | awk '($3 == 
> 1){print $3}' | wc -l
> 84
> 
> Also, wonder if I should add some new -f* option to control the optimization
> or doing it always at -O+ with -fdisable-tree-pass-dyninit as a way to
> disable it is good enough, and whether the 1024 hardcoded constant
> (upper bound on optimized size so that we don't spend huge amounts of
> compile time trying to optimize initializers of gigabyte sizes) shouldn't be
> a param.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.

As a general comment I wonder whether doing this fully in the C++
frontend leveraging the constexpr support is a better approach, esp.
before we end up putting all initializers into a single function ...
even partly constexpr evaluating things might help in some case.

On that note it might be worth experimenting with keeping each
initializer in a separate function until IPA where IPA could
then figure out dependences via IPA REFs (with LTO on the whole
program), a) diagnosing inter-CU undefined behavior, b) "fixing"
things by making sure the initialization happens init-before-use
(when there's no cycle), c) with local analysis do the promotion
to READONLY at IPA time and elide the function.

I think most PRs really ask for more optimistic constexpr
evaluation on the frontend side.

Richard.

> 2021-11-04  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR c++/102876
> gcc/
>       * internal-fn.def (DYNAMIC_INIT_START, DYNAMIC_INIT_END): New internal
>       functions.
>       * internal-fn.c (expand_DYNAMIC_INIT_START, expand_DYNAMIC_INIT_END):
>       New functions.
>       * tree-pass.h (make_pass_dyninit): Declare.
>       * passes.def (pass_dyninit): Add after dce4.
>       * gimple-ssa-store-merging.c (pass_data_dyninit): New variable.
>       (class pass_dyninit): New type.
>       (pass_dyninit::execute): New method.
>       (make_pass_dyninit): New function.
> gcc/cp/
>       * decl2.c (one_static_initialization_or_destruction): Emit
>       .DYNAMIC_INIT_START and .DYNAMIC_INIT_END internal calls around
>       dynamic initialization of variables that don't need a guard.
> gcc/testsuite/
>       * g++.dg/opt/init3.C: New test.
> 
> --- gcc/internal-fn.def.jj    2021-11-02 09:05:47.029664211 +0100
> +++ gcc/internal-fn.def       2021-11-02 12:40:38.702436113 +0100
> @@ -367,6 +367,10 @@ DEF_INTERNAL_FN (PHI, 0, NULL)
>     automatic variable.  */
>  DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  
> +/* Mark start and end of dynamic initialization of a variable.  */
> +DEF_INTERNAL_FN (DYNAMIC_INIT_START, ECF_LEAF | ECF_NOTHROW, ". r ")
> +DEF_INTERNAL_FN (DYNAMIC_INIT_END, ECF_LEAF | ECF_NOTHROW, ". r ")
> +
>  /* DIM_SIZE and DIM_POS return the size of a particular compute
>     dimension and the executing thread's position within that
>     dimension.  DIM_POS is pure (and not const) so that it isn't
> --- gcc/internal-fn.c.jj      2021-11-02 09:05:47.029664211 +0100
> +++ gcc/internal-fn.c 2021-11-02 12:40:38.703436099 +0100
> @@ -3485,6 +3485,16 @@ expand_CO_ACTOR (internal_fn, gcall *)
>    gcc_unreachable ();
>  }
>  
> +static void
> +expand_DYNAMIC_INIT_START (internal_fn, gcall *)
> +{
> +}
> +
> +static void
> +expand_DYNAMIC_INIT_END (internal_fn, gcall *)
> +{
> +}
> +
>  /* Expand a call to FN using the operands in STMT.  FN has a single
>     output operand and NARGS input operands.  */
>  
> --- gcc/tree-pass.h.jj        2021-10-28 11:29:01.891721153 +0200
> +++ gcc/tree-pass.h   2021-11-02 14:15:00.139185088 +0100
> @@ -445,6 +445,7 @@ extern gimple_opt_pass *make_pass_cse_re
>  extern gimple_opt_pass *make_pass_cse_sincos (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_dyninit (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_warn_function_return (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context 
> *ctxt);
> --- gcc/passes.def.jj 2021-11-01 14:37:06.685853324 +0100
> +++ gcc/passes.def    2021-11-02 14:23:47.836715821 +0100
> @@ -261,6 +261,7 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_tsan);
>        NEXT_PASS (pass_dse);
>        NEXT_PASS (pass_dce);
> +      NEXT_PASS (pass_dyninit);
>        /* Pass group that runs when 1) enabled, 2) there are loops
>        in the function.  Make sure to run pass_fix_loops before
>        to discover/remove loops before running the gate function
> --- gcc/gimple-ssa-store-merging.c.jj 2021-09-01 12:06:19.488211919 +0200
> +++ gcc/gimple-ssa-store-merging.c    2021-11-03 18:02:55.190015359 +0100
> @@ -170,6 +170,8 @@
>  #include "optabs-tree.h"
>  #include "dbgcnt.h"
>  #include "selftest.h"
> +#include "cgraph.h"
> +#include "varasm.h"
>  
>  /* The maximum size (in bits) of the stores this pass should generate.  */
>  #define MAX_STORE_BITSIZE (BITS_PER_WORD)
> @@ -5465,6 +5467,334 @@ pass_store_merging::execute (function *f
>    return 0;
>  }
>  
> +/* Pass to optimize C++ dynamic initialization.  */
> +
> +const pass_data pass_data_dyninit = {
> +  GIMPLE_PASS,     /* type */
> +  "dyninit",    /* name */
> +  OPTGROUP_NONE,   /* optinfo_flags */
> +  TV_GIMPLE_STORE_MERGING,    /* tv_id */
> +  PROP_ssa,  /* properties_required */
> +  0,            /* properties_provided */
> +  0,            /* properties_destroyed */
> +  0,            /* todo_flags_start */
> +  0,         /* todo_flags_finish */
> +};
> +
> +class pass_dyninit : public gimple_opt_pass
> +{
> +public:
> +  pass_dyninit (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_dyninit, ctxt)
> +  {
> +  }
> +
> +  virtual bool
> +  gate (function *fun)
> +  {
> +    return (DECL_ARTIFICIAL (fun->decl)
> +         && DECL_STATIC_CONSTRUCTOR (fun->decl)
> +         && optimize);
> +  }
> +
> +  virtual unsigned int execute (function *);
> +}; // class pass_dyninit
> +
> +unsigned int
> +pass_dyninit::execute (function *fun)
> +{
> +  basic_block bb;
> +  auto_vec<gimple *, 32> ifns;
> +  hash_map<tree, gimple *> *map = NULL;
> +  auto_vec<tree, 32> vars;
> +  gimple **cur = NULL;
> +  bool ssdf_calls = false;
> +
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      for (gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +        !gsi_end_p (gsi); gsi_next (&gsi))
> +     {
> +       gimple *stmt = gsi_stmt (gsi);
> +       if (is_gimple_debug (stmt))
> +         continue;
> +
> +       /* The C++ FE can wrap dynamic initialization of certain
> +          variables with a pair of iternal function calls, like:
> +          .DYNAMIC_INIT_START (&b, 0);
> +          b = 1;
> +          .DYNAMIC_INIT_END (&b);
> +
> +          or
> +          .DYNAMIC_INIT_START (&e, 1);
> +          # DEBUG this => &e.f
> +          MEM[(struct S *)&e + 4B] ={v} {CLOBBER};
> +          MEM[(struct S *)&e + 4B].a = 1;
> +          MEM[(struct S *)&e + 4B].b = 2;
> +          MEM[(struct S *)&e + 4B].c = 3;
> +          # DEBUG BEGIN_STMT
> +          MEM[(struct S *)&e + 4B].d = 6;
> +          # DEBUG this => NULL
> +          .DYNAMIC_INIT_END (&e);
> +
> +          Verify if there are only stores of constants to the corresponding
> +          variable or parts of that variable and if so, try to reconstruct
> +          a static initializer from the static initializer if any and
> +          the constant stores into the variable.  This is permitted by
> +          [basic.start.static]/3.  */
> +       if (is_gimple_call (stmt))
> +         {
> +           if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_START))
> +             {
> +               ifns.safe_push (stmt);
> +               if (cur)
> +                 *cur = NULL;
> +               tree arg = gimple_call_arg (stmt, 0);
> +               gcc_assert (TREE_CODE (arg) == ADDR_EXPR
> +                           && DECL_P (TREE_OPERAND (arg, 0)));
> +               tree var = TREE_OPERAND (arg, 0);
> +               gcc_checking_assert (is_global_var (var));
> +               varpool_node *node = varpool_node::get (var);
> +               if (node == NULL
> +                   || node->in_other_partition
> +                   || TREE_ASM_WRITTEN (var)
> +                   || DECL_SIZE_UNIT (var) == NULL_TREE
> +                   || !tree_fits_uhwi_p (DECL_SIZE_UNIT (var))
> +                   || tree_to_uhwi (DECL_SIZE_UNIT (var)) > 1024
> +                   || TYPE_SIZE_UNIT (TREE_TYPE (var)) == NULL_TREE
> +                   || !tree_int_cst_equal (TYPE_SIZE_UNIT (TREE_TYPE (var)),
> +                                           DECL_SIZE_UNIT (var)))
> +                 continue;
> +               if (map == NULL)
> +                 map = new hash_map<tree, gimple *> (61);
> +               bool existed_p;
> +               cur = &map->get_or_insert (var, &existed_p);
> +               if (existed_p)
> +                 {
> +                   /* Punt if we see more than one .DYNAMIC_INIT_START
> +                      internal call for the same variable.  */
> +                   *cur = NULL;
> +                   cur = NULL;
> +                 }
> +               else
> +                 {
> +                   *cur = stmt;
> +                   vars.safe_push (var);
> +                 }
> +               continue;
> +             }
> +           else if (gimple_call_internal_p (stmt, IFN_DYNAMIC_INIT_END))
> +             {
> +               ifns.safe_push (stmt);
> +               tree arg = gimple_call_arg (stmt, 0);
> +               gcc_assert (TREE_CODE (arg) == ADDR_EXPR
> +                           && DECL_P (TREE_OPERAND (arg, 0)));
> +               tree var = TREE_OPERAND (arg, 0);
> +               gcc_checking_assert (is_global_var (var));
> +               if (cur)
> +                 {
> +                   /* Punt if .DYNAMIC_INIT_END call argument doesn't
> +                      pair with .DYNAMIC_INIT_START.  */
> +                   if (vars.last () != var)
> +                     *cur = NULL;
> +                   cur = NULL;
> +                 }
> +               continue;
> +             }
> +
> +           /* Punt if we see any artificial
> +              __static_initialization_and_destruction_* calls, e.g. if
> +              it would be partially inlined, because we wouldn't then see
> +              all .DYNAMIC_INIT_* calls.  */
> +           tree fndecl = gimple_call_fndecl (stmt);
> +           if (fndecl
> +               && DECL_ARTIFICIAL (fndecl)
> +               && DECL_NAME (fndecl)
> +               && startswith (IDENTIFIER_POINTER (DECL_NAME (fndecl)),
> +                              "__static_initialization_and_destruction_"))
> +             ssdf_calls = true;
> +         }
> +       if (cur)
> +         {
> +           if (store_valid_for_store_merging_p (stmt))
> +             {
> +               tree lhs = gimple_assign_lhs (stmt);
> +               tree rhs = gimple_assign_rhs1 (stmt);
> +               poly_int64 bitsize, bitpos;
> +               HOST_WIDE_INT ibitsize, ibitpos;
> +               machine_mode mode;
> +               int unsignedp, reversep, volatilep = 0;
> +               tree offset;
> +               tree var = vars.last ();
> +               if (rhs_valid_for_store_merging_p (rhs)
> +                   && get_inner_reference (lhs, &bitsize, &bitpos, &offset,
> +                                           &mode, &unsignedp, &reversep,
> +                                           &volatilep) == var
> +                   && !reversep
> +                   && !volatilep
> +                   && (offset == NULL_TREE || integer_zerop (offset))
> +                   && bitsize.is_constant (&ibitsize)
> +                   && bitpos.is_constant (&ibitpos)
> +                   && ibitpos >= 0
> +                   && ibitsize <= tree_to_shwi (DECL_SIZE (var))
> +                   && ibitsize + ibitpos <= tree_to_shwi (DECL_SIZE (var)))
> +                 continue;
> +             }
> +           *cur = NULL;
> +           cur = NULL;
> +         }
> +     }
> +      if (cur)
> +     {
> +       *cur = NULL;
> +       cur = NULL;
> +     }
> +    }
> +  if (map && !ssdf_calls)
> +    {
> +      for (tree var : vars)
> +     {
> +       gimple *g = *map->get (var);
> +       if (g == NULL)
> +         continue;
> +       varpool_node *node = varpool_node::get (var);
> +       node->get_constructor ();
> +       tree init = DECL_INITIAL (var);
> +       if (init == NULL)
> +         init = build_zero_cst (TREE_TYPE (var));
> +       gimple_stmt_iterator gsi = gsi_for_stmt (g);
> +       unsigned char *buf = NULL;
> +       unsigned int buf_size = tree_to_uhwi (DECL_SIZE_UNIT (var));
> +       bool buf_valid = false;
> +       do
> +         {
> +           gsi_next (&gsi);
> +           gimple *stmt = gsi_stmt (gsi);
> +           if (is_gimple_debug (stmt))
> +             continue;
> +           if (is_gimple_call (stmt))
> +             break;
> +           if (gimple_clobber_p (stmt))
> +             continue;
> +           tree lhs = gimple_assign_lhs (stmt);
> +           tree rhs = gimple_assign_rhs1 (stmt);
> +           if (lhs == var)
> +             {
> +               /* Simple assignment to the whole variable.
> +                  rhs is the initializer.  */
> +               buf_valid = false;
> +               init = rhs;
> +               continue;
> +             }
> +           poly_int64 bitsize, bitpos;
> +           machine_mode mode;
> +           int unsignedp, reversep, volatilep = 0;
> +           tree offset;
> +           get_inner_reference (lhs, &bitsize, &bitpos, &offset,
> +                                &mode, &unsignedp, &reversep, &volatilep);
> +           HOST_WIDE_INT ibitsize = bitsize.to_constant ();
> +           HOST_WIDE_INT ibitpos = bitpos.to_constant ();
> +           if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
> +               || CHAR_BIT != 8
> +               || BITS_PER_UNIT != 8)
> +             {
> +               g = NULL;
> +               break;
> +             }
> +           if (!buf_valid)
> +             {
> +               if (buf == NULL)
> +                 buf = XNEWVEC (unsigned char, buf_size * 2);
> +               memset (buf, 0, buf_size);
> +               if (native_encode_initializer (init, buf, buf_size)
> +                   != (int) buf_size)
> +                 {
> +                   g = NULL;
> +                   break;
> +                 }
> +               buf_valid = true;
> +             }
> +           /* Otherwise go through byte representation.  */
> +           if (!encode_tree_to_bitpos (rhs, buf, ibitsize,
> +                                       ibitpos, buf_size))
> +             {
> +               g = NULL;
> +               break;
> +             }
> +         }
> +       while (1);
> +       if (g == NULL)
> +         {
> +           XDELETE (buf);
> +           continue;
> +         }
> +       if (buf_valid)
> +         {
> +           init = native_interpret_aggregate (TREE_TYPE (var), buf, 0,
> +                                              buf_size);
> +           if (init)
> +             {
> +               /* Verify the dynamic initialization doesn't e.g. set
> +                  some padding bits to non-zero by trying to encode
> +                  it again and comparing.  */
> +               memset (buf + buf_size, 0, buf_size);
> +               if (native_encode_initializer (init, buf + buf_size,
> +                                              buf_size) != (int) buf_size
> +                   || memcmp (buf, buf + buf_size, buf_size) != 0)
> +                 init = NULL_TREE;
> +             }
> +         }
> +       XDELETE (buf);
> +       if (!init || !initializer_constant_valid_p (init, TREE_TYPE (var)))
> +         continue;
> +       if (integer_nonzerop (gimple_call_arg (g, 1)))
> +         TREE_READONLY (var) = 1;
> +       if (dump_file)
> +         {
> +           fprintf (dump_file, "dynamic initialization of ");
> +           print_generic_stmt (dump_file, var, TDF_SLIM);
> +           fprintf (dump_file, " optimized into: ");
> +           print_generic_stmt (dump_file, init, TDF_SLIM);
> +           if (TREE_READONLY (var))
> +             fprintf (dump_file, " and making it read-only\n");
> +           fprintf (dump_file, "\n");
> +         }
> +       if (initializer_zerop (init))
> +         DECL_INITIAL (var) = NULL_TREE;
> +       else
> +         DECL_INITIAL (var) = init;
> +       gsi = gsi_for_stmt (g);
> +       gsi_next (&gsi);
> +       do
> +         {
> +           gimple *stmt = gsi_stmt (gsi);
> +           if (is_gimple_debug (stmt))
> +             {
> +               gsi_next (&gsi);
> +               continue;
> +             }
> +           if (is_gimple_call (stmt))
> +             break;
> +           /* Remove now all the stores for the dynamic initialization.  */
> +           unlink_stmt_vdef (stmt);
> +           gsi_remove (&gsi, true);
> +           if (gimple_vdef (stmt))
> +             release_ssa_name (gimple_vdef (stmt));
> +         }
> +       while (1);
> +     }
> +    }
> +  delete map;
> +  for (gimple *g : ifns)
> +    {
> +      gimple_stmt_iterator gsi = gsi_for_stmt (g);
> +      unlink_stmt_vdef (g);
> +      gsi_remove (&gsi, true);
> +      if (gimple_vdef (g))
> +     release_ssa_name (gimple_vdef (g));
> +    }
> +  return 0;
> +}
>  } // anon namespace
>  
>  /* Construct and return a store merging pass object.  */
> @@ -5475,6 +5805,14 @@ make_pass_store_merging (gcc::context *c
>    return new pass_store_merging (ctxt);
>  }
>  
> +/* Construct and return a dyninit pass object.  */
> +
> +gimple_opt_pass *
> +make_pass_dyninit (gcc::context *ctxt)
> +{
> +  return new pass_dyninit (ctxt);
> +}
> +
>  #if CHECKING_P
>  
>  namespace selftest {
> --- gcc/cp/decl2.c.jj 2021-11-02 09:05:47.004664566 +0100
> +++ gcc/cp/decl2.c    2021-11-03 17:18:11.395288518 +0100
> @@ -4133,13 +4133,36 @@ one_static_initialization_or_destruction
>      {
>        if (init)
>       {
> +       bool sanitize = sanitize_flags_p (SANITIZE_ADDRESS, decl);
> +       if (optimize && guard == NULL_TREE && !sanitize)
> +         {
> +           tree t = build_fold_addr_expr (decl);
> +           tree type = TREE_TYPE (decl);
> +           tree is_const
> +             = constant_boolean_node (TYPE_READONLY (type)
> +                                      && !cp_has_mutable_p (type),
> +                                      boolean_type_node);
> +           t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl),
> +                                             IFN_DYNAMIC_INIT_START,
> +                                             void_type_node, 2, t,
> +                                             is_const);
> +           finish_expr_stmt (t);
> +         }
>         finish_expr_stmt (init);
> -       if (sanitize_flags_p (SANITIZE_ADDRESS, decl))
> +       if (sanitize)
>           {
>             varpool_node *vnode = varpool_node::get (decl);
>             if (vnode)
>               vnode->dynamically_initialized = 1;
>           }
> +       else if (optimize && guard == NULL_TREE)
> +         {
> +           tree t = build_fold_addr_expr (decl);
> +           t = build_call_expr_internal_loc (DECL_SOURCE_LOCATION (decl),
> +                                             IFN_DYNAMIC_INIT_END,
> +                                             void_type_node, 1, t);
> +           finish_expr_stmt (t);
> +         }
>       }
>  
>        /* If we're using __cxa_atexit, register a function that calls the
> --- gcc/testsuite/g++.dg/opt/init3.C.jj       2021-11-03 17:53:01.872472570 
> +0100
> +++ gcc/testsuite/g++.dg/opt/init3.C  2021-11-03 17:52:57.484535115 +0100
> @@ -0,0 +1,31 @@
> +// PR c++/102876
> +// { dg-do compile }
> +// { dg-options "-O2 -fdump-tree-dyninit" }
> +// { dg-final { scan-tree-dump "dynamic initialization of b\[\n\r]* 
> optimized into: 1" "dyninit" } }
> +// { dg-final { scan-tree-dump "dynamic initialization of e\[\n\r]* 
> optimized into: {.e=5, .f={.a=1, .b=2, .c=3, .d=6}, .g=6}\[\n\r]* and making 
> it read-only" "dyninit" } }
> +// { dg-final { scan-tree-dump "dynamic initialization of f\[\n\r]* 
> optimized into: {.e=7, .f={.a=1, .b=2, .c=3, .d=6}, .g=1}" "dyninit" } }
> +// { dg-final { scan-tree-dump "dynamic initialization of h\[\n\r]* 
> optimized into: {.h=8, .i={.a=1, .b=2, .c=3, .d=6}, .j=9}" "dyninit" } }
> +// { dg-final { scan-tree-dump-times "dynamic initialization of " 4 
> "dyninit" } }
> +// { dg-final { scan-tree-dump-times "and making it read-only" 1 "dyninit" } 
> }
> +
> +struct S { S () : a(1), b(2), c(3), d(4) { d += 2; } int a, b, c, d; };
> +struct T { int e; S f; int g; };
> +struct U { int h; mutable S i; int j; };
> +extern int b;
> +int foo (int &);
> +int bar (int &);
> +int baz () { return 1; }
> +int qux () { return b = 2; }
> +// Dynamic initialization of a shouldn't be optimized, foo can't be inlined.
> +int a = foo (b);
> +int b = baz ();
> +// Likewise for c.
> +int c = bar (b);
> +// While qux is inlined, the dynamic initialization modifies another
> +// variable, so punt for d as well.
> +int d = qux ();
> +const T e = { 5, S (), 6 };
> +T f = { 7, S (), baz () };
> +const T &g = e;
> +const U h = { 8, S (), 9 };
> +const U &i = h;
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

Reply via email to