Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

Marek Polacek Wed, 25 Sep 2024 13:21:53 -0700

On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
> On 9/24/24 5:10 PM, Marek Polacek wrote:
> > On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> > > On 9/20/24 12:18 AM, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > This PR reports a missed optimization.  When we have:
> > > > 
> > > >     Str str{"Test"};
> > > >     callback(str);
> > > > 
> > > > as in the test, we're able to evaluate the Str::Str() call at compile
> > > > time.  But when we have:
> > > > 
> > > >     callback(Str{"Test"});
> > > > 
> > > > we are not.  With this patch (in fact, it's Patrick's patch with a 
> > > > little
> > > > tweak), we turn
> > > > 
> > > >     callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
> > > >       5
> > > >       __ct_comp
> > > >       D.2890
> > > >       (struct Str *) <<< Unknown tree: void_cst >>>
> > > >       (const char *) "Test" >>>>)
> > > > 
> > > > into
> > > > 
> > > >     callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", 
> > > > .length=4}>)
> > > > 
> > > > I explored the idea of calling maybe_constant_value for the whole
> > > > TARGET_EXPR in cp_fold.  That has three problems:
> > > > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> > > >     result is also a TARGET_EXPR;
> > > 
> > > I'd think that the result should always be a TARGET_EXPR for a class, and
> > > that's the case we want to fold; a TARGET_EXPR for a scalar is always the
> > > initialize-temp-and-use-it pattern you mention below.
> > 
> > Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
> > > > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> > > >     Things happen;
> > > 
> > > I guess maybe_constant_value should be fixed to preserve flags regardless 
> > > of
> > > this change.
> > 
> > Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
> > but here we go into the break_out_target_exprs block in maybe_constant_value
> > and that doesn't necessarily preserve them.
> > 
> > > > - getting a new slot is also problematic.  I've seen a test where we
> > > >     had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole 
> > > > TARGET_EXPR
> > > >     would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the 
> > > > outer
> > > >     D.2680, we can't replace it with D.2681, and things break.
> > > 
> > > Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?
> > 
> > ...unfortunately that doesn't always help.  I've reduced an example into:
> > 
> >    struct optional {
> >      constexpr optional(int) {}
> >    };
> >    optional foo() { return 2; }
> > 
> > where check_return_expr creates a COMPOUND_EXPR:
> > 
> >          retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
> >                           TREE_OPERAND (retval, 0));
> > 
> > where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
> > > > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> > > > 
> > > > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted 
> > > > redundant store: .*.a = {}"
> > > > is easy.  Previously, we would call C::C, so .gimple has:
> > > > 
> > > >     D.2590 = {};
> > > >     C::C (&D.2590);
> > > >     D.2597 = D.2590;
> > > >     return D.2597;
> > > > 
> > > > Then .einline inlines the C::C call:
> > > > 
> > > >     D.2590 = {};
> > > >     D.2590.a = {}; // #1
> > > >     D.2590.b = 0;  // #2
> > > >     D.2597 = D.2590;
> > > >     D.2590 ={v} {CLOBBER(eos)};
> > > >     return D.2597;
> > > > 
> > > > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> > > > passes.  But with the patch, .gimple won't have that C::C call, so the
> > > > IL is of course going to look different.
> > > 
> > > Maybe -fno-inline instead of the --param?
> > 
> > Then that C::C call isn't inlined and the test fails :/.
> > > > Unfortunately, pr78687.C is much more complicated and I can't explain
> > > > precisely what happens there.  But it seems like a good idea to have
> > > > a way to avoid this optimization.  So I've added the "noinline" check.
> > > 
> > > Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
> > > constructors are not constexpr.  And I wouldn't expect the optimization to
> > > affect the value-initialization option_2().
> > 
> > In pr78687.C we do this new optimization only once for
> > "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept 
> > (std::is_nothrow_constructible<T, U&&>::value)".
> 
> Aha.  Can we remove its constexpr?
> ...no, it's defaulted.  And moving the defaulting after the class also
> breaks the testcase.
> 
> > > >         PR c++/116416
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > >         * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
> > > >         TARGET_EXPR_INITIAL and replace it with the folded result if
> > > >         it's TREE_CONSTANT.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > >         * g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > > >         * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > > >         * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > > >         * g++.dg/tree-ssa/pr90883.C: Likewise.
> > > >         * g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > > > 
> > > > Co-authored-by: Patrick Palka <ppa...@redhat.com>
> > > > ---
> > > >    gcc/cp/cp-gimplify.cc                         | 14 +++++++++
> > > >    gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
> > > >    .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 
> > > > +++++++++++++++++++
> > > >    gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
> > > >    gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
> > > >    gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
> > > >    6 files changed, 50 insertions(+), 3 deletions(-)
> > > >    create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > > > 
> > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > index 003e68f1ea7..41d6333f650 100644
> > > > --- a/gcc/cp/cp-gimplify.cc
> > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, 
> > > > void *data_)
> > > >          that case, strip it in favor of this one.  */
> > > >          if (tree &init = TARGET_EXPR_INITIAL (stmt))
> > > >         {
> > > > +         tree fn;
> > > > +         if ((data->flags & ff_genericize)
> > > > +             /* Give the user an option to opt out.  */
> > > > +             && !((fn = current_function_decl)
> > > > +                  && lookup_attribute ("noinline",
> > > > +                                       DECL_ATTRIBUTES (fn))))
> > > 
> > > This looks backwards from the usual sense of noinline, which suppresses
> > > inlining of the function so marked, rather than inlining of other 
> > > functions
> > > called.  If you want to check noinline, you need to dig out the called
> > > function (if any), perhaps with extract_call_expr.
> > 
> > Now I wish I hadn't done that.  But for implicit functions, like in
> > pr90883.C where we have
> > 
> >      class C
> >      {
> >          char a[7]{};
> >          int b{};
> >      };
> > there's nothing to make noinline anyway.  So I'm stuck.  I suppose
> > it wouldn't make sense to use -fno-fold-simple-inlines to disable
> > this optimization.  Can I abuse -fearly-inlining?  I don't want a new
> > flag for this :(.
> > 
> > Or should we just adjust the tests?
> 
> I think so, yes.  Probably so that they test that the output doesn't contain
> the redundancy that the PR complains about; reaching optimal output by
> another path isn't a problem.
> 
> For pr78687, I guess that means checking the number of aggregate
> temporaries.
> 
> For pr90883, maybe check for the lack of ".a =" and ".b = "?


Sure.
 
> What is the .optimized for these two tests after your patch?

pr90883 is fine, the dumps are the same.  In pr78687 they very much
aren't.

Let's see what's happening here.  In .gimple, the difference is
in make_object_1:

 struct ref_proxy make_object_1 ()
 {
   struct variant D.9472;
-  struct option_2 D.9273;

   try
     {
-      try
-        {
-          eggs::variants::variant<option_1, option_2>::variant<option_2> 
(&D.9472, &D.9273);
-          ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy 
(&<retval>, D.9472);
-          return <retval>;
-        }
-      finally
-        {
-          D.9273 = {CLOBBER(eos)};
-        }
+      D.9472 = {};
+      D.9472._storage._which = 2;
+      ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy 
(&<retval>, D.9472);
+      return <retval>;

that makes sense: the new optimization "inlined" the
eggs::variants::variant<option_1, option_2>::variant<option_2> call.  So later
.einline has nothing to expand here whereas previously 

  eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472, 
&D.9273);

was expanded into

  MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
  MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
  MEM[(struct _storage *)&D.9472]._which = 2;

Then make_object_1 gets inlined.  Something happens in SRA.  And
then we got rid of a lot of code.  But now a lot of code remains.

Is it simply the fact that with this opt we expand the ctor into

  D.9472 = {};
  D.9472._storage._which = 2;

which means that we zero-init the variant object and then set _which to 2,
while previously we just allocated storage and set _which to 2?

This is .optimized with the opt on:

struct ref_proxy f ()
{
  struct ref_proxy ptr;
  struct ref_proxy D.10036;
  struct ref_proxy type;
  struct ref_proxy type;
  struct qual_option D.10031;
  struct ref_proxy D.10030;
  struct qual_option inner;
  struct variant t;
  struct variant D.10026;
  struct variant D.10024;
  struct inplace_ref D.10023;
  struct inplace_ref ptr;
  struct ref_proxy D.9898;

  <bb 2> [local count: 1073741824]:
  MEM <char[40]> [(struct variant *)&D.10024] = {};
  D.10024._storage._which = 2;
  D.10026 = D.10024;
  t = D.10026;
  MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
  MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
  t ={v} {CLOBBER(eos)};
  D.10026 ={v} {CLOBBER(eos)};
  D.10024 ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
  D.10036 = D.9898;
  ptr = D.10036;
  MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
  MEM[(struct variant_ref *)&D.10030].inner_storage_ = 
ptr.D.9270.inner_storage_;
  ptr ={v} {CLOBBER(eos)};
  D.10036 ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
  type = D.10030;
  type = type;
  MEM[(struct __as_base  &)&D.10031] ={v} {CLOBBER(bob)};
  D.10031.type_ = type;
  type ={v} {CLOBBER(eos)};
  type ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
  D.10031.quals_ = 0;
  inner = D.10031;
  D.10023 ={v} {CLOBBER(bob)};
  D.10023.inner_ = inner;
  inner ={v} {CLOBBER(eos)};
  D.10030 ={v} {CLOBBER(eos)};
  D.10031 ={v} {CLOBBER(eos)};
  MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
  MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
  ptr = D.10023;
  <retval> ={v} {CLOBBER(bob)};
  <retval>.D.9858 = ptr;
  ptr ={v} {CLOBBER(eos)};
  D.9898 ={v} {CLOBBER(eos)};
  return <retval>;

}

Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

Reply via email to