On Thu, 26 Sep 2024, Marek Polacek wrote: > On Wed, Sep 25, 2024 at 08:45:50PM -0400, Jason Merrill wrote: > > On 9/25/24 4:21 PM, Marek Polacek wrote: > > > On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote: > > > > On 9/24/24 5:10 PM, Marek Polacek wrote: > > > > > On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote: > > > > > > On 9/20/24 12:18 AM, Marek Polacek wrote: > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > > > > > > > > > > > -- >8 -- > > > > > > > This PR reports a missed optimization. When we have: > > > > > > > > > > > > > > Str str{"Test"}; > > > > > > > callback(str); > > > > > > > > > > > > > > as in the test, we're able to evaluate the Str::Str() call at > > > > > > > compile > > > > > > > time. But when we have: > > > > > > > > > > > > > > callback(Str{"Test"}); > > > > > > > > > > > > > > we are not. With this patch (in fact, it's Patrick's patch with > > > > > > > a little > > > > > > > tweak), we turn > > > > > > > > > > > > > > callback (TARGET_EXPR <D.2890, <<< Unknown tree: > > > > > > > aggr_init_expr > > > > > > > 5 > > > > > > > __ct_comp > > > > > > > D.2890 > > > > > > > (struct Str *) <<< Unknown tree: void_cst >>> > > > > > > > (const char *) "Test" >>>>) > > > > > > > > > > > > > > into > > > > > > > > > > > > > > callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", > > > > > > > .length=4}>) > > > > > > > > > > > > > > I explored the idea of calling maybe_constant_value for the whole > > > > > > > TARGET_EXPR in cp_fold. That has three problems: > > > > > > > - we can't always elide a TARGET_EXPR, so we'd have to make sure > > > > > > > the > > > > > > > result is also a TARGET_EXPR; > > > > > > > > > > > > I'd think that the result should always be a TARGET_EXPR for a > > > > > > class, and > > > > > > that's the case we want to fold; a TARGET_EXPR for a scalar is > > > > > > always the > > > > > > initialize-temp-and-use-it pattern you mention below. > > > > > > > > > > Checking CLASS_TYPE_P would solve some of the problems, yes. But... > > > > > > > - the resulting TARGET_EXPR must have the same flags, otherwise > > > > > > > Bad > > > > > > > Things happen; > > > > > > > > > > > > I guess maybe_constant_value should be fixed to preserve flags > > > > > > regardless of > > > > > > this change. > > > > > > > > > > Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR > > > > > flags, > > > > > but here we go into the break_out_target_exprs block in > > > > > maybe_constant_value > > > > > and that doesn't necessarily preserve them. > > > > > > > > > > > > - getting a new slot is also problematic. I've seen a test where > > > > > > > we > > > > > > > had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the > > > > > > > whole TARGET_EXPR > > > > > > > would get us "TARGET_EXPR<D.2681, ...>", but since we don't > > > > > > > see the outer > > > > > > > D.2680, we can't replace it with D.2681, and things break. > > > > > > > > > > > > Hmm, yeah. Maybe only if TARGET_EXPR_IMPLICIT_P? > > > > > > > > > > ...unfortunately that doesn't always help. I've reduced an example > > > > > into: > > > > > > > > > > struct optional { > > > > > constexpr optional(int) {} > > > > > }; > > > > > optional foo() { return 2; } > > > > > > > > > > where check_return_expr creates a COMPOUND_EXPR: > > > > > > > > > > retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval, > > > > > TREE_OPERAND (retval, 0)); > > > > > > > > > > where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P. > > > > > > > With this patch, two tree-ssa tests regressed: pr78687.C and > > > > > > > pr90883.C. > > > > > > > > > > > > > > FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted > > > > > > > redundant store: .*.a = {}" > > > > > > > is easy. Previously, we would call C::C, so .gimple has: > > > > > > > > > > > > > > D.2590 = {}; > > > > > > > C::C (&D.2590); > > > > > > > D.2597 = D.2590; > > > > > > > return D.2597; > > > > > > > > > > > > > > Then .einline inlines the C::C call: > > > > > > > > > > > > > > D.2590 = {}; > > > > > > > D.2590.a = {}; // #1 > > > > > > > D.2590.b = 0; // #2 > > > > > > > D.2597 = D.2590; > > > > > > > D.2590 ={v} {CLOBBER(eos)}; > > > > > > > return D.2597; > > > > > > > > > > > > > > then #2 is removed in .fre1, and #1 is removed in .dse1. So the > > > > > > > test > > > > > > > passes. But with the patch, .gimple won't have that C::C call, > > > > > > > so the > > > > > > > IL is of course going to look different. > > > > > > > > > > > > Maybe -fno-inline instead of the --param? > > > > > > > > > > Then that C::C call isn't inlined and the test fails :/. > > > > > > > Unfortunately, pr78687.C is much more complicated and I can't > > > > > > > explain > > > > > > > precisely what happens there. But it seems like a good idea to > > > > > > > have > > > > > > > a way to avoid this optimization. So I've added the "noinline" > > > > > > > check. > > > > > > > > > > > > Hmm, I'm surprised make_object_1 would be affected, since the > > > > > > ref_proxy > > > > > > constructors are not constexpr. And I wouldn't expect the > > > > > > optimization to > > > > > > affect the value-initialization option_2(). > > > > > > > > > > In pr78687.C we do this new optimization only once for > > > > > "constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept > > > > > (std::is_nothrow_constructible<T, U&&>::value)". > > > > > > > > Aha. Can we remove its constexpr? > > > > ...no, it's defaulted. And moving the defaulting after the class also > > > > breaks the testcase. > > FWIW, removing EGGS_CXX11_CONSTEXPR on line 360 "fixes" the test. > > > > > > > > PR c++/116416 > > > > > > > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > > > > > > > * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold > > > > > > > TARGET_EXPR_INITIAL and replace it with the folded result if > > > > > > > it's TREE_CONSTANT. > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > * g++.dg/analyzer/pr97116.C: Adjust dg-message. > > > > > > > * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus. > > > > > > > * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)). > > > > > > > * g++.dg/tree-ssa/pr90883.C: Likewise. > > > > > > > * g++.dg/cpp1y/constexpr-prvalue1.C: New test. > > > > > > > > > > > > > > Co-authored-by: Patrick Palka <ppa...@redhat.com> > > > > > > > --- > > > > > > > gcc/cp/cp-gimplify.cc | 14 +++++++++ > > > > > > > gcc/testsuite/g++.dg/analyzer/pr97116.C | 2 +- > > > > > > > .../g++.dg/cpp1y/constexpr-prvalue1.C | 29 > > > > > > > +++++++++++++++++++ > > > > > > > gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C | 2 +- > > > > > > > gcc/testsuite/g++.dg/tree-ssa/pr78687.C | 5 +++- > > > > > > > gcc/testsuite/g++.dg/tree-ssa/pr90883.C | 1 + > > > > > > > 6 files changed, 50 insertions(+), 3 deletions(-) > > > > > > > create mode 100644 > > > > > > > gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C > > > > > > > > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > > > > > > > index 003e68f1ea7..41d6333f650 100644 > > > > > > > --- a/gcc/cp/cp-gimplify.cc > > > > > > > +++ b/gcc/cp/cp-gimplify.cc > > > > > > > @@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int > > > > > > > *walk_subtrees, void *data_) > > > > > > > that case, strip it in favor of this one. */ > > > > > > > if (tree &init = TARGET_EXPR_INITIAL (stmt)) > > > > > > > { > > > > > > > + tree fn; > > > > > > > + if ((data->flags & ff_genericize) > > > > > > > + /* Give the user an option to opt out. */ > > > > > > > + && !((fn = current_function_decl) > > > > > > > + && lookup_attribute ("noinline", > > > > > > > + DECL_ATTRIBUTES (fn)))) > > > > > > > > > > > > This looks backwards from the usual sense of noinline, which > > > > > > suppresses > > > > > > inlining of the function so marked, rather than inlining of other > > > > > > functions > > > > > > called. If you want to check noinline, you need to dig out the > > > > > > called > > > > > > function (if any), perhaps with extract_call_expr. > > > > > > > > > > Now I wish I hadn't done that. But for implicit functions, like in > > > > > pr90883.C where we have > > > > > > > > > > class C > > > > > { > > > > > char a[7]{}; > > > > > int b{}; > > > > > }; > > > > > there's nothing to make noinline anyway. So I'm stuck. I suppose > > > > > it wouldn't make sense to use -fno-fold-simple-inlines to disable > > > > > this optimization. Can I abuse -fearly-inlining? I don't want a new > > > > > flag for this :(. > > > > > > > > > > Or should we just adjust the tests? > > > > > > > > I think so, yes. Probably so that they test that the output doesn't > > > > contain > > > > the redundancy that the PR complains about; reaching optimal output by > > > > another path isn't a problem. > > > > > > > > For pr78687, I guess that means checking the number of aggregate > > > > temporaries. > > > > > > > > For pr90883, maybe check for the lack of ".a =" and ".b = "? > > > > > > Sure. > > > > What is the .optimized for these two tests after your patch? > > > > > > pr90883 is fine, the dumps are the same. In pr78687 they very much > > > aren't. > > > > > > Let's see what's happening here. In .gimple, the difference is > > > in make_object_1: > > > > > > struct ref_proxy make_object_1 () > > > { > > > struct variant D.9472; > > > - struct option_2 D.9273; > > > > > > try > > > { > > > - try > > > - { > > > - eggs::variants::variant<option_1, option_2>::variant<option_2> > > > (&D.9472, &D.9273); > > > - ref_proxy<option_2, variant_ref<option_1, option_2> > > > >::ref_proxy (&<retval>, D.9472); > > > - return <retval>; > > > - } > > > - finally > > > - { > > > - D.9273 = {CLOBBER(eos)}; > > > - } > > > + D.9472 = {}; > > > + D.9472._storage._which = 2; > > > + ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy > > > (&<retval>, D.9472); > > > + return <retval>; > > > > > > that makes sense: the new optimization "inlined" the > > > eggs::variants::variant<option_1, option_2>::variant<option_2> call. So > > > later > > > .einline has nothing to expand here whereas previously > > > > > > eggs::variants::variant<option_1, option_2>::variant<option_2> > > > (&D.9472, &D.9273); > > > > > > was expanded into > > > > > > MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)}; > > > MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)}; > > > MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)}; > > > MEM[(struct _storage *)&D.9472]._which = 2; > > > > > > Then make_object_1 gets inlined. Something happens in SRA. And > > > then we got rid of a lot of code. But now a lot of code remains. > > > > > > Is it simply the fact that with this opt we expand the ctor into > > > > > > D.9472 = {}; > > > D.9472._storage._which = 2; > > > > > > which means that we zero-init the variant object and then set _which to 2, > > > while previously we just allocated storage and set _which to 2? > > > > That seems likely since the patch that fixed the bug before was dealing with > > partially-initialized objects. Why does the optimization change that? > > CCing a few optimizer folks. To recap, we're talking about > tree-ssa/pr78687.C. > In in, there's: > > EGGS_CXX11_CONSTEXPR variant(U&& v) > noexcept( > std::is_nothrow_constructible<T, U&&>::value) > : _storage{detail::index<I + 1>{}, std::forward<U>(v)} > {} > > With a new C++ FE optimization this patch introduces, we can evaluate the call > at compile time. Then .gimple looks a little different (see above). What is > not clear is why we can't optimize the code as much as without this patch, > when the variant call isn't evaluated at compile time, and instead we produce > those MEMs as shown above. > > Any insights would be appreciated. > > This is .optimized with the opt on: > > __attribute__((noinline)) > struct ref_proxy f () > { > struct ref_proxy ptr; > struct ref_proxy D.10036; > struct ref_proxy type; > struct ref_proxy type; > struct qual_option D.10031; > struct ref_proxy D.10030; > struct qual_option inner; > struct variant t; > struct variant D.10026; > struct variant D.10024; > struct inplace_ref D.10023; > struct inplace_ref ptr; > struct ref_proxy D.9898; > > <bb 2> [local count: 1073741824]: > MEM <char[40]> [(struct variant *)&D.10024] = {};
Without actually checking it might be that SRA chokes on the above. The IL is basically a huge chain of aggregate copies interspersed with clobbers and occasional scalar inits and I fear that we really only have SRA dealing with this. Is there any reason to use the char[40] init instead of a aggregate {} init of type variant? I would suggest to open a bugreport. > D.10024._storage._which = 2; > D.10026 = D.10024; > t = D.10026; > MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)}; > MEM[(struct variant_ref *)&D.9898].inner_storage_ = t; > t ={v} {CLOBBER(eos)}; > D.10026 ={v} {CLOBBER(eos)}; > D.10024 ={v} {CLOBBER(eos)}; > MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2; > D.10036 = D.9898; > ptr = D.10036; > MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)}; > MEM[(struct variant_ref *)&D.10030].inner_storage_ = > ptr.D.9270.inner_storage_; > ptr ={v} {CLOBBER(eos)}; > D.10036 ={v} {CLOBBER(eos)}; > MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2; > type = D.10030; > type = type; > MEM[(struct __as_base &)&D.10031] ={v} {CLOBBER(bob)}; > D.10031.type_ = type; > type ={v} {CLOBBER(eos)}; > type ={v} {CLOBBER(eos)}; > MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2; > D.10031.quals_ = 0; > inner = D.10031; > D.10023 ={v} {CLOBBER(bob)}; > D.10023.inner_ = inner; > inner ={v} {CLOBBER(eos)}; > D.10030 ={v} {CLOBBER(eos)}; > D.10031 ={v} {CLOBBER(eos)}; > MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2; > MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0; > ptr = D.10023; > <retval> ={v} {CLOBBER(bob)}; > <retval>.D.9858 = ptr; > ptr ={v} {CLOBBER(eos)}; > D.9898 ={v} {CLOBBER(eos)}; > return <retval>; > > } > > and this is the result without this patch: > > __attribute__((noinline)) > struct ref_proxy f () > { > <bb 2> [local count: 1073741824]: > <retval> ={v} {CLOBBER(bob)}; > MEM <size_t> [(struct inplace_ref *)&<retval> + 40B] = 2; > MEM <int> [(struct inplace_ref *)&<retval> + 48B] = 0; > return <retval>; > > } > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)