On 9/25/24 4:21 PM, Marek Polacek wrote:
On Wed, Sep 25, 2024 at 08:54:46AM -0400, Jason Merrill wrote:
On 9/24/24 5:10 PM, Marek Polacek wrote:
On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
On 9/20/24 12:18 AM, Marek Polacek wrote:
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
-- >8 --
This PR reports a missed optimization. When we have:
Str str{"Test"};
callback(str);
as in the test, we're able to evaluate the Str::Str() call at compile
time. But when we have:
callback(Str{"Test"});
we are not. With this patch (in fact, it's Patrick's patch with a little
tweak), we turn
callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
5
__ct_comp
D.2890
(struct Str *) <<< Unknown tree: void_cst >>>
(const char *) "Test" >>>>)
into
callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
I explored the idea of calling maybe_constant_value for the whole
TARGET_EXPR in cp_fold. That has three problems:
- we can't always elide a TARGET_EXPR, so we'd have to make sure the
result is also a TARGET_EXPR;
I'd think that the result should always be a TARGET_EXPR for a class, and
that's the case we want to fold; a TARGET_EXPR for a scalar is always the
initialize-temp-and-use-it pattern you mention below.
Checking CLASS_TYPE_P would solve some of the problems, yes. But...
- the resulting TARGET_EXPR must have the same flags, otherwise Bad
Things happen;
I guess maybe_constant_value should be fixed to preserve flags regardless of
this change.
Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
but here we go into the break_out_target_exprs block in maybe_constant_value
and that doesn't necessarily preserve them.
- getting a new slot is also problematic. I've seen a test where we
had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
D.2680, we can't replace it with D.2681, and things break.
Hmm, yeah. Maybe only if TARGET_EXPR_IMPLICIT_P?
...unfortunately that doesn't always help. I've reduced an example into:
struct optional {
constexpr optional(int) {}
};
optional foo() { return 2; }
where check_return_expr creates a COMPOUND_EXPR:
retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
TREE_OPERAND (retval, 0));
where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store: .*.a
= {}"
is easy. Previously, we would call C::C, so .gimple has:
D.2590 = {};
C::C (&D.2590);
D.2597 = D.2590;
return D.2597;
Then .einline inlines the C::C call:
D.2590 = {};
D.2590.a = {}; // #1
D.2590.b = 0; // #2
D.2597 = D.2590;
D.2590 ={v} {CLOBBER(eos)};
return D.2597;
then #2 is removed in .fre1, and #1 is removed in .dse1. So the test
passes. But with the patch, .gimple won't have that C::C call, so the
IL is of course going to look different.
Maybe -fno-inline instead of the --param?
Then that C::C call isn't inlined and the test fails :/.
Unfortunately, pr78687.C is much more complicated and I can't explain
precisely what happens there. But it seems like a good idea to have
a way to avoid this optimization. So I've added the "noinline" check.
Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
constructors are not constexpr. And I wouldn't expect the optimization to
affect the value-initialization option_2().
In pr78687.C we do this new optimization only once for
"constexpr eggs::variants::variant<Ts>::variant(U&&) noexcept (std::is_nothrow_constructible<T,
U&&>::value)".
Aha. Can we remove its constexpr?
...no, it's defaulted. And moving the defaulting after the class also
breaks the testcase.
PR c++/116416
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
TARGET_EXPR_INITIAL and replace it with the folded result if
it's TREE_CONSTANT.
gcc/testsuite/ChangeLog:
* g++.dg/analyzer/pr97116.C: Adjust dg-message.
* g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
* g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
* g++.dg/tree-ssa/pr90883.C: Likewise.
* g++.dg/cpp1y/constexpr-prvalue1.C: New test.
Co-authored-by: Patrick Palka <ppa...@redhat.com>
---
gcc/cp/cp-gimplify.cc | 14 +++++++++
gcc/testsuite/g++.dg/analyzer/pr97116.C | 2 +-
.../g++.dg/cpp1y/constexpr-prvalue1.C | 29 +++++++++++++++++++
gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C | 2 +-
gcc/testsuite/g++.dg/tree-ssa/pr78687.C | 5 +++-
gcc/testsuite/g++.dg/tree-ssa/pr90883.C | 1 +
6 files changed, 50 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 003e68f1ea7..41d6333f650 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1473,6 +1473,20 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
that case, strip it in favor of this one. */
if (tree &init = TARGET_EXPR_INITIAL (stmt))
{
+ tree fn;
+ if ((data->flags & ff_genericize)
+ /* Give the user an option to opt out. */
+ && !((fn = current_function_decl)
+ && lookup_attribute ("noinline",
+ DECL_ATTRIBUTES (fn))))
This looks backwards from the usual sense of noinline, which suppresses
inlining of the function so marked, rather than inlining of other functions
called. If you want to check noinline, you need to dig out the called
function (if any), perhaps with extract_call_expr.
Now I wish I hadn't done that. But for implicit functions, like in
pr90883.C where we have
class C
{
char a[7]{};
int b{};
};
there's nothing to make noinline anyway. So I'm stuck. I suppose
it wouldn't make sense to use -fno-fold-simple-inlines to disable
this optimization. Can I abuse -fearly-inlining? I don't want a new
flag for this :(.
Or should we just adjust the tests?
I think so, yes. Probably so that they test that the output doesn't contain
the redundancy that the PR complains about; reaching optimal output by
another path isn't a problem.
For pr78687, I guess that means checking the number of aggregate
temporaries.
For pr90883, maybe check for the lack of ".a =" and ".b = "?
Sure.
What is the .optimized for these two tests after your patch?
pr90883 is fine, the dumps are the same. In pr78687 they very much
aren't.
Let's see what's happening here. In .gimple, the difference is
in make_object_1:
struct ref_proxy make_object_1 ()
{
struct variant D.9472;
- struct option_2 D.9273;
try
{
- try
- {
- eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472,
&D.9273);
- ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy
(&<retval>, D.9472);
- return <retval>;
- }
- finally
- {
- D.9273 = {CLOBBER(eos)};
- }
+ D.9472 = {};
+ D.9472._storage._which = 2;
+ ref_proxy<option_2, variant_ref<option_1, option_2> >::ref_proxy
(&<retval>, D.9472);
+ return <retval>;
that makes sense: the new optimization "inlined" the
eggs::variants::variant<option_1, option_2>::variant<option_2> call. So later
.einline has nothing to expand here whereas previously
eggs::variants::variant<option_1, option_2>::variant<option_2> (&D.9472,
&D.9273);
was expanded into
MEM[(struct _storage *)&D.9472] ={v} {CLOBBER(bob)};
MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
MEM[(struct _union *)&D.9472] ={v} {CLOBBER(bob)};
MEM[(struct _storage *)&D.9472]._which = 2;
Then make_object_1 gets inlined. Something happens in SRA. And
then we got rid of a lot of code. But now a lot of code remains.
Is it simply the fact that with this opt we expand the ctor into
D.9472 = {};
D.9472._storage._which = 2;
which means that we zero-init the variant object and then set _which to 2,
while previously we just allocated storage and set _which to 2?
That seems likely since the patch that fixed the bug before was dealing
with partially-initialized objects. Why does the optimization change that?
This is .optimized with the opt on:
struct ref_proxy f ()
{
struct ref_proxy ptr;
struct ref_proxy D.10036;
struct ref_proxy type;
struct ref_proxy type;
struct qual_option D.10031;
struct ref_proxy D.10030;
struct qual_option inner;
struct variant t;
struct variant D.10026;
struct variant D.10024;
struct inplace_ref D.10023;
struct inplace_ref ptr;
struct ref_proxy D.9898;
<bb 2> [local count: 1073741824]:
MEM <char[40]> [(struct variant *)&D.10024] = {};
D.10024._storage._which = 2;
D.10026 = D.10024;
t = D.10026;
MEM[(struct variant_ref *)&D.9898] ={v} {CLOBBER(bob)};
MEM[(struct variant_ref *)&D.9898].inner_storage_ = t;
t ={v} {CLOBBER(eos)};
D.10026 ={v} {CLOBBER(eos)};
D.10024 ={v} {CLOBBER(eos)};
MEM <size_t> [(struct ref_proxy *)&D.9898 + 40B] = 2;
D.10036 = D.9898;
ptr = D.10036;
MEM[(struct variant_ref *)&D.10030] ={v} {CLOBBER(bob)};
MEM[(struct variant_ref *)&D.10030].inner_storage_ =
ptr.D.9270.inner_storage_;
ptr ={v} {CLOBBER(eos)};
D.10036 ={v} {CLOBBER(eos)};
MEM <size_t> [(struct ref_proxy *)&D.10030 + 40B] = 2;
type = D.10030;
type = type;
MEM[(struct __as_base &)&D.10031] ={v} {CLOBBER(bob)};
D.10031.type_ = type;
type ={v} {CLOBBER(eos)};
type ={v} {CLOBBER(eos)};
MEM <size_t> [(struct qual_option *)&D.10031 + 40B] = 2;
D.10031.quals_ = 0;
inner = D.10031;
D.10023 ={v} {CLOBBER(bob)};
D.10023.inner_ = inner;
inner ={v} {CLOBBER(eos)};
D.10030 ={v} {CLOBBER(eos)};
D.10031 ={v} {CLOBBER(eos)};
MEM <size_t> [(struct inplace_ref *)&D.10023 + 40B] = 2;
MEM <int> [(struct inplace_ref *)&D.10023 + 48B] = 0;
ptr = D.10023;
<retval> ={v} {CLOBBER(bob)};
<retval>.D.9858 = ptr;
ptr ={v} {CLOBBER(eos)};
D.9898 ={v} {CLOBBER(eos)};
return <retval>;
}