On 9/24/24 5:10 PM, Marek Polacek wrote:
On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
On 9/20/24 12:18 AM, Marek Polacek wrote:
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR reports a missed optimization.  When we have:

    Str str{"Test"};
    callback(str);

as in the test, we're able to evaluate the Str::Str() call at compile
time.  But when we have:

    callback(Str{"Test"});

we are not.  With this patch (in fact, it's Patrick's patch with a little
tweak), we turn

    callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
      5
      __ct_comp
      D.2890
      (struct Str *) <<< Unknown tree: void_cst >>>
      (const char *) "Test" >>>>)

into

    callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)

I explored the idea of calling maybe_constant_value for the whole
TARGET_EXPR in cp_fold.  That has three problems:
- we can't always elide a TARGET_EXPR, so we'd have to make sure the
    result is also a TARGET_EXPR;

I'd think that the result should always be a TARGET_EXPR for a class, and
that's the case we want to fold; a TARGET_EXPR for a scalar is always the
initialize-temp-and-use-it pattern you mention below.

Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
- the resulting TARGET_EXPR must have the same flags, otherwise Bad
    Things happen;

I guess maybe_constant_value should be fixed to preserve flags regardless of
this change.

Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
but here we go into the break_out_target_exprs block in maybe_constant_value
and that doesn't necessarily preserve them.

- getting a new slot is also problematic.  I've seen a test where we
    had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
    would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
    D.2680, we can't replace it with D.2681, and things break.

Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?

...unfortunately that doesn't always help.  I've reduced an example into:

   struct optional {
     constexpr optional(int) {}
   };
   optional foo() { return 2; }

where check_return_expr creates a COMPOUND_EXPR:

         retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
                          TREE_OPERAND (retval, 0));

where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.

FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a 
= {}"
is easy.  Previously, we would call C::C, so .gimple has:

    D.2590 = {};
    C::C (&D.2590);
    D.2597 = D.2590;
    return D.2597;

Then .einline inlines the C::C call:

    D.2590 = {};
    D.2590.a = {}; // #1
    D.2590.b = 0;  // #2
    D.2597 = D.2590;
    D.2590 ={v} {CLOBBER(eos)};
    return D.2597;

then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
passes.  But with the patch, .gimple won't have that C::C call, so the
IL is of course going to look different.

Maybe -fno-inline instead of the --param?

Then that C::C call isn't inlined and the test fails :/.
Unfortunately, pr78687.C is much more complicated and I can't explain
precisely what happens there.  But it seems like a good idea to have
a way to avoid this optimization.  So I've added the "noinline" check.

Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
constructors are not constexpr.  And I wouldn't expect the optimization to
affect the value-initialization option_2().

In pr78687.C we do this new optimization only once for
"constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, 
U&&>::value)".

Aha.  Can we remove its constexpr?
...no, it's defaulted. And moving the defaulting after the class also breaks the testcase.

        PR c++/116416

gcc/cp/ChangeLog:

        * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
        TARGET_EXPR_INITIAL and replace it with the folded result if
        it's TREE_CONSTANT.

gcc/testsuite/ChangeLog:

        * g++.dg/analyzer/pr97116.C: Adjust dg-message.
        * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
        * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
        * g++.dg/tree-ssa/pr90883.C: Likewise.
        * g++.dg/cpp1y/constexpr-prvalue1.C: New test.

Co-authored-by: Patrick Palka <ppa...@redhat.com>
---
   gcc/cp/cp-gimplify.cc                         | 14 +++++++++
   gcc/testsuite/g++.dg/analyzer/pr97116.C       |  2 +-
   .../g++.dg/cpp1y/constexpr-prvalue1.C         | 29 +++++++++++++++++++
   gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
   gcc/testsuite/g++.dg/tree-ssa/pr78687.C       |  5 +++-
   gcc/testsuite/g++.dg/tree-ssa/pr90883.C       |  1 +
   6 files changed, 50 insertions(+), 3 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 003e68f1ea7..41d6333f650 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
         that case, strip it in favor of this one.  */
         if (tree &init = TARGET_EXPR_INITIAL (stmt))
        {
+         tree fn;
+         if ((data->flags & ff_genericize)
+             /* Give the user an option to opt out.  */
+             && !((fn = current_function_decl)
+                  && lookup_attribute ("noinline",
+                                       DECL_ATTRIBUTES (fn))))

This looks backwards from the usual sense of noinline, which suppresses
inlining of the function so marked, rather than inlining of other functions
called.  If you want to check noinline, you need to dig out the called
function (if any), perhaps with extract_call_expr.

Now I wish I hadn't done that.  But for implicit functions, like in
pr90883.C where we have

     class C
     {
         char a[7]{};
         int b{};
     };
there's nothing to make noinline anyway. So I'm stuck. I suppose
it wouldn't make sense to use -fno-fold-simple-inlines to disable
this optimization.  Can I abuse -fearly-inlining?  I don't want a new
flag for this :(.

Or should we just adjust the tests?

I think so, yes. Probably so that they test that the output doesn't contain the redundancy that the PR complains about; reaching optimal output by another path isn't a problem.

For pr78687, I guess that means checking the number of aggregate temporaries.

For pr90883, maybe check for the lack of ".a =" and ".b = "?

What is the .optimized for these two tests after your patch?

Jason

Reply via email to