On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote: > On 9/20/24 12:18 AM, Marek Polacek wrote: > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > -- >8 -- > > This PR reports a missed optimization. When we have: > > > > Str str{"Test"}; > > callback(str); > > > > as in the test, we're able to evaluate the Str::Str() call at compile > > time. But when we have: > > > > callback(Str{"Test"}); > > > > we are not. With this patch (in fact, it's Patrick's patch with a little > > tweak), we turn > > > > callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr > > 5 > > __ct_comp > > D.2890 > > (struct Str *) <<< Unknown tree: void_cst >>> > > (const char *) "Test" >>>>) > > > > into > > > > callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>) > > > > I explored the idea of calling maybe_constant_value for the whole > > TARGET_EXPR in cp_fold. That has three problems: > > - we can't always elide a TARGET_EXPR, so we'd have to make sure the > > result is also a TARGET_EXPR; > > I'd think that the result should always be a TARGET_EXPR for a class, and > that's the case we want to fold; a TARGET_EXPR for a scalar is always the > initialize-temp-and-use-it pattern you mention below.
Checking CLASS_TYPE_P would solve some of the problems, yes. But... > > - the resulting TARGET_EXPR must have the same flags, otherwise Bad > > Things happen; > > I guess maybe_constant_value should be fixed to preserve flags regardless of > this change. Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags, but here we go into the break_out_target_exprs block in maybe_constant_value and that doesn't necessarily preserve them. > > - getting a new slot is also problematic. I've seen a test where we > > had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR > > would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer > > D.2680, we can't replace it with D.2681, and things break. > > Hmm, yeah. Maybe only if TARGET_EXPR_IMPLICIT_P? ...unfortunately that doesn't always help. I've reduced an example into: struct optional { constexpr optional(int) {} }; optional foo() { return 2; } where check_return_expr creates a COMPOUND_EXPR: retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval, TREE_OPERAND (retval, 0)); where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P. > > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C. > > > > FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant > > store: .*.a = {}" > > is easy. Previously, we would call C::C, so .gimple has: > > > > D.2590 = {}; > > C::C (&D.2590); > > D.2597 = D.2590; > > return D.2597; > > > > Then .einline inlines the C::C call: > > > > D.2590 = {}; > > D.2590.a = {}; // #1 > > D.2590.b = 0; // #2 > > D.2597 = D.2590; > > D.2590 ={v} {CLOBBER(eos)}; > > return D.2597; > > > > then #2 is removed in .fre1, and #1 is removed in .dse1. So the test > > passes. But with the patch, .gimple won't have that C::C call, so the > > IL is of course going to look different. > > Maybe -fno-inline instead of the --param? Then that C::C call isn't inlined and the test fails :/. > > Unfortunately, pr78687.C is much more complicated and I can't explain > > precisely what happens there. But it seems like a good idea to have > > a way to avoid this optimization. So I've added the "noinline" check. > > Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy > constructors are not constexpr. And I wouldn't expect the optimization to > affect the value-initialization option_2(). In pr78687.C we do this new optimization only once for "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T, U&&>::value)". > > PR c++/116416 > > > > gcc/cp/ChangeLog: > > > > * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold > > TARGET_EXPR_INITIAL and replace it with the folded result if > > it's TREE_CONSTANT. > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/analyzer/pr97116.C: Adjust dg-message. > > * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus. > > * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)). > > * g++.dg/tree-ssa/pr90883.C: Likewise. > > * g++.dg/cpp1y/constexpr-prvalue1.C: New test. > > > > Co-authored-by: Patrick Palka <ppa...@redhat.com> > > --- > > gcc/cp/cp-gimplify.cc | 14 +++++++++ > > gcc/testsuite/g++.dg/analyzer/pr97116.C | 2 +- > > .../g++.dg/cpp1y/constexpr-prvalue1.C | 29 +++++++++++++++++++ > > gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C | 2 +- > > gcc/testsuite/g++.dg/tree-ssa/pr78687.C | 5 +++- > > gcc/testsuite/g++.dg/tree-ssa/pr90883.C | 1 + > > 6 files changed, 50 insertions(+), 3 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > index 003e68f1ea7..41d6333f650 100644 > > --- a/gcc/cp/cp-gimplify.cc > > +++ b/gcc/cp/cp-gimplify.cc > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void > > *data_) > > that case, strip it in favor of this one. */ > > if (tree &init = TARGET_EXPR_INITIAL (stmt)) > > { > > + tree fn; > > + if ((data->flags & ff_genericize) > > + /* Give the user an option to opt out. */ > > + && !((fn = current_function_decl) > > + && lookup_attribute ("noinline", > > + DECL_ATTRIBUTES (fn)))) > > This looks backwards from the usual sense of noinline, which suppresses > inlining of the function so marked, rather than inlining of other functions > called. If you want to check noinline, you need to dig out the called > function (if any), perhaps with extract_call_expr. Now I wish I hadn't done that. But for implicit functions, like in pr90883.C where we have class C { char a[7]{}; int b{}; }; there's nothing to make noinline anyway. So I'm stuck. I suppose it wouldn't make sense to use -fno-fold-simple-inlines to disable this optimization. Can I abuse -fearly-inlining? I don't want a new flag for this :(. Or should we just adjust the tests? > This should also share conditions with the code that calls > maybe_constant_value for CALL_EXPR (flag_no_inline, in particular). Done. So I have --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -1473,6 +1473,16 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_) that case, strip it in favor of this one. */ if (tree &init = TARGET_EXPR_INITIAL (stmt)) { + if ((data->flags & ff_genericize) + && !flag_no_inline) + { + tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt)); + if (folded != init && TREE_CONSTANT (folded)) + { + init = folded; + break; + } + } cp_walk_tree (&init, cp_fold_r, data, NULL); cp_walk_tree (&TARGET_EXPR_CLEANUP (stmt), cp_fold_r, data, NULL); *walk_subtrees = 0; But there are still these two fails as mentioned above: FAIL: g++.dg/tree-ssa/pr78687.C scan-tree-dump sra "Removing load:.*ptr;" FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"