Hi Marek,
On 13 Dec 2024, at 0:44, Marek Polacek wrote:
> On Thu, Dec 12, 2024 at 07:07:38PM +0000, Simon Martin wrote:
>> We currently ICE upon the following valid (under -Wno-vla) code
>>
>> === cut here ===
>> void f(int c) {
>> constexpr int r = 4;
>> [&](auto) { int t[r * c]; }(0);
>> }
>> === cut here ===
>>
>> The problem is that when parsing the lambda body, and more
>> specifically
>> the multiplication, we mark the lambda as
>> LAMBDA_EXPR_CAPTURE_OPTIMIZED
>> even though the replacement of r by 4 is "undone" by the call to
>> build_min_non_dep in build_x_binary_op. This makes
>> prune_lambda_captures
>
> Ah yeah, because build_min_non_dep gets the original operands.
>
>> remove the proxy declaration while it should not, and we trip on an
>> assert at instantiation time.
>>
>> This patch fixes the ICE by making sure that lambdas are only marked
>> as
>> LAMBDA_EXPR_CAPTURE_OPTIMIZED when they're instantiated (I tried
>> other
>> strategies like not undoing constant folding in build_min_non_dep,
>> but
>> it is pretty intrusive and breaks lots of things).
>
> I've tried that too and I also ran into a number of issues. I also
> tried
> checking p_t_d in prune_lambda_capture since it already says "Don't
> bother
> pruning in a template" but that doesn't work, either.
Yeah, the fundamental “problem” is that for lambdas that are not
within a template, we generate the closure before instantiating the
lambda function, so prune_lambda_capture thinks (without my patch) that
captures to constants have been folded out, which might be the case
(e.g. with ok_3 in the test I added) or not (e.g. in this PR) depending
on whether they’re part of an expression for which build_min_non_dep
has been called.
>> The test I added also shows that we don't always optimize out
>> captures
>> to constants for lambdas that are not within a template (see ok_2 for
>> example, or ok_3 that unlike ok_2 "regresses" a bit with my patch) -
>> I'm
>> curious if we consider it a problem or not? If so, I can try to fix
>> this
>> in a follow-up patch.
>
> Since "P0588R1: Simplifying implicit lambda capture" they are
> captures,
> though [expr.prim.lambda.capture] gives us license to optimize the
> captures if not ODR-used. I couldn't find a test where this patch
> would be an ABI change.
Thanks for the pointer! Yeah, whether we optimise out the captures or
not remains legit.
And if you look specifically at ok_3 from the test case (without the
patch, the capture is optimised out, with it it’s not anymore):
- I don’t think it’s an ABI change since we don’t do anything
with captures in mangle.cc.
- Since captures are only relevant within the function calling them,
we don’t have any issue if we link together object files generated
without the patch and object files generated with the patch.
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>> PR c++/114292
>>
>> gcc/cp/ChangeLog:
>>
>> * cp-tree.h (mark_const_var_capture_optimized): Declare.
>> * expr.cc (mark_use): Call mark_const_var_capture_optimized.
>> * lambda.cc (mark_const_var_capture_optimized): New. Only set
>> LAMBDA_EXPR_CAPTURE_OPTIMIZED at lambda instantiation time.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.dg/cpp1y/lambda-ice4.C: New test.
>>
>> ---
>> gcc/cp/cp-tree.h | 1 +
>> gcc/cp/expr.cc | 10 ++----
>> gcc/cp/lambda.cc | 13 +++++++
>> gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C | 44
>> ++++++++++++++++++++++++
>> 4 files changed, 60 insertions(+), 8 deletions(-)
>> create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
>>
>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>> index c5e0fc5c440..ce050032fdb 100644
>> --- a/gcc/cp/cp-tree.h
>> +++ b/gcc/cp/cp-tree.h
>> @@ -8058,6 +8058,7 @@ extern bool is_constant_capture_proxy
>> (tree);
>> extern void register_capture_members (tree);
>> extern tree lambda_expr_this_capture (tree, int);
>> extern void maybe_generic_this_capture (tree, tree);
>> +extern void mark_const_var_capture_optimized (void);
>> extern tree maybe_resolve_dummy (tree, bool);
>> extern tree current_nonlambda_function (void);
>> extern tree nonlambda_method_basetype (void);
>> diff --git a/gcc/cp/expr.cc b/gcc/cp/expr.cc
>> index de4991e616c..d6a2454c46e 100644
>> --- a/gcc/cp/expr.cc
>> +++ b/gcc/cp/expr.cc
>> @@ -120,10 +120,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>> {
>> tree val = RECUR (cap);
>> if (!is_capture_proxy (val))
>> - {
>> - tree l = current_lambda_expr ();
>> - LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>> - }
>> + mark_const_var_capture_optimized ();
>> return val;
>> }
>> }
>> @@ -171,10 +168,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>> {
>> tree val = RECUR (cap);
>> if (!is_capture_proxy (val))
>> - {
>> - tree l = current_lambda_expr ();
>> - LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>> - }
>> + mark_const_var_capture_optimized ();
>> return val;
>> }
>> }
>> diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
>> index d8a15d97d5d..4fd3b39c99b 100644
>> --- a/gcc/cp/lambda.cc
>> +++ b/gcc/cp/lambda.cc
>> @@ -945,6 +945,19 @@ resolvable_dummy_lambda (tree object)
>> return NULL_TREE;
>> }
>>
>> +/* Called when optimizing out a capture to a const variable. */
>> +
>> +void
>> +mark_const_var_capture_optimized ()
>> +{
>> + /* The actual optimizing out only occurs when instantiating the
>> lambda. */
>> + if (processing_template_decl)
>> + return;
>> +
>> + tree l = current_lambda_expr ();
>> + LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>> +}
>
> If we do this, perhaps the function can be a static function in
> expr.cc,
> and maybe we can check processing_template_decl before
> is_capture_proxy.
Yeah, good suggestions, thanks!
That’s what the attached updated patch does - also successfully tested
on x86_64-pc-linux-gnu. Ok for trunk?
Simon
From 236d758c2746f2c5534698c13872741fe8b9b92e Mon Sep 17 00:00:00 2001
From: Simon Martin <si...@nasilyan.com>
Date: Thu, 12 Dec 2024 13:21:17 +0100
Subject: [PATCH] c++: Only prune capture proxies for constant variables at
instantiation time [PR114292]
We currently ICE upon the following valid (under -Wno-vla) code
=== cut here ===
void f(int c) {
constexpr int r = 4;
[&](auto) { int t[r * c]; }(0);
}
=== cut here ===
The problem is that when parsing the lambda body, and more specifically
the multiplication, we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED
even though the replacement of r by 4 is "undone" by the call to
build_min_non_dep in build_x_binary_op. This makes prune_lambda_captures
remove the proxy declaration while it should not, and we trip on an
assert at instantiation time.
This patch fixes the ICE by making sure that lambdas are only marked as
LAMBDA_EXPR_CAPTURE_OPTIMIZED when they're instantiated (I tried other
strategies like not undoing constant folding in build_min_non_dep, but
it is pretty intrusive and breaks lots of things).
The test I added also shows that we don't always optimize out captures
to constants for lambdas that are not within a template (see ok_2 for
example, or ok_3 that unlike ok_2 "regresses" a bit with my patch) - I'm
curious if we consider it a problem or not? If so, I can try to fix this
in a follow-up patch.
Successfully tested on x86_64-pc-linux-gnu.
PR c++/114292
gcc/cp/ChangeLog:
* expr.cc (maybe_mark_const_var_capture_optimized): New. Only
set LAMBDA_EXPR_CAPTURE_OPTIMIZED at lambda instantiation time.
(mark_use): Call maybe_mark_const_var_capture_optimized.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/lambda-ice4.C: New test.
---
gcc/cp/expr.cc | 31 +++++++++++------
gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C | 44 ++++++++++++++++++++++++
2 files changed, 65 insertions(+), 10 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
diff --git a/gcc/cp/expr.cc b/gcc/cp/expr.cc
index de4991e616c..980b954154c 100644
--- a/gcc/cp/expr.cc
+++ b/gcc/cp/expr.cc
@@ -86,6 +86,25 @@ cplus_expand_constant (tree cst)
return cst;
}
+/* Called when folding out a capture C to a const variable to VALUE. */
+
+static void
+maybe_mark_const_var_capture_optimized (tree c, tree value)
+{
+ gcc_assert (is_constant_capture_proxy (c));
+
+ /* The actual optimizing out only occurs when instantiating the lambda. */
+ if (processing_template_decl)
+ return;
+
+ /* If the value is another capture, the capture is not optimized out. */
+ if (is_capture_proxy (value))
+ return;
+
+ tree l = current_lambda_expr ();
+ LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
+}
+
/* We've seen an actual use of EXPR. Possibly replace an outer variable
reference inside with its constant value or a lambda capture. */
@@ -119,11 +138,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
&& decl_constant_var_p (cap))
{
tree val = RECUR (cap);
- if (!is_capture_proxy (val))
- {
- tree l = current_lambda_expr ();
- LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
- }
+ maybe_mark_const_var_capture_optimized (expr, val);
return val;
}
}
@@ -170,11 +185,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
&& decl_constant_var_p (cap))
{
tree val = RECUR (cap);
- if (!is_capture_proxy (val))
- {
- tree l = current_lambda_expr ();
- LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
- }
+ maybe_mark_const_var_capture_optimized (ref, val);
return val;
}
}
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
b/gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
new file mode 100644
index 00000000000..fe8df52b827
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
@@ -0,0 +1,44 @@
+// PR c++/114292
+// { dg-do "compile" { target c++14 } }
+// { dg-additional-options "-Wno-vla" }
+
+#define ASSERT_CAPTURE_NUMBER(Lambda, NumCaptures) \
+ { \
+ auto oneCapture = [&](auto) { int t[c]; }; \
+ const auto sizeOneCapture = sizeof (oneCapture); \
+ const auto expected = NumCaptures ? NumCaptures * sizeOneCapture : 1; \
+ static_assert (sizeof (Lambda) == expected, ""); \
+ }
+
+void foo (int c)
+{
+ constexpr int r = 4;
+
+ // This used to ICE.
+ [&](auto) { int t[c * r]; }(0);
+
+ // All those worked already, but were not covered by any test - do it here.
+ auto ok_1 = [&](auto) { int t[c]; };
+ ok_1 (0);
+ ASSERT_CAPTURE_NUMBER (ok_1, 1);
+
+ auto ok_2 = [&](auto) { int n = r * c; int t[n]; };
+ ok_2 (0);
+ ASSERT_CAPTURE_NUMBER (ok_2, 2);
+
+ auto ok_3 = [&](auto) { int t[r]; };
+ ok_3 (0);
+ ASSERT_CAPTURE_NUMBER (ok_3, 1); // Was 0 before the patch.
+
+ auto ok_4 = [&](auto) { int t[c * 4]; };
+ ok_4 (0);
+ ASSERT_CAPTURE_NUMBER (ok_4, 1);
+
+ auto ok_5 = [&](auto) { int t[1]; };
+ ok_5 (0);
+ ASSERT_CAPTURE_NUMBER (ok_5, 0);
+
+ auto ok_6 = [&](auto) { int t[c * r]; };
+ ok_6 (0);
+ ASSERT_CAPTURE_NUMBER (ok_6, 2);
+}
--
2.44.0