Hi, On 13 Dec 2024, at 13:50, Simon Martin wrote:
> Hi Marek, > > On 13 Dec 2024, at 0:44, Marek Polacek wrote: > >> On Thu, Dec 12, 2024 at 07:07:38PM +0000, Simon Martin wrote: >>> We currently ICE upon the following valid (under -Wno-vla) code >>> >>> === cut here === >>> void f(int c) { >>> constexpr int r = 4; >>> [&](auto) { int t[r * c]; }(0); >>> } >>> === cut here === >>> >>> The problem is that when parsing the lambda body, and more >>> specifically >>> the multiplication, we mark the lambda as >>> LAMBDA_EXPR_CAPTURE_OPTIMIZED >>> even though the replacement of r by 4 is "undone" by the call to >>> build_min_non_dep in build_x_binary_op. This makes >>> prune_lambda_captures >> >> Ah yeah, because build_min_non_dep gets the original operands. >> >>> remove the proxy declaration while it should not, and we trip on an >>> assert at instantiation time. >>> >>> This patch fixes the ICE by making sure that lambdas are only marked >>> as >>> LAMBDA_EXPR_CAPTURE_OPTIMIZED when they're instantiated (I tried >>> other >>> strategies like not undoing constant folding in build_min_non_dep, > >>> but >>> it is pretty intrusive and breaks lots of things). >> >> I've tried that too and I also ran into a number of issues. I also > >> tried >> checking p_t_d in prune_lambda_capture since it already says "Don't > >> bother >> pruning in a template" but that doesn't work, either. > Yeah, the fundamental “problem” is that for lambdas that are not > within a template, we generate the closure before instantiating the > lambda function, so prune_lambda_capture thinks (without my patch) that > captures to constants have been folded out, which might be the case > (e.g. with ok_3 in the test I added) or not (e.g. in this PR) depending > on whether they’re part of an expression for which build_min_non_dep > has been called. > >>> The test I added also shows that we don't always optimize out >>> captures >>> to constants for lambdas that are not within a template (see ok_2 for > >>> example, or ok_3 that unlike ok_2 "regresses" a bit with my patch) - >>> I'm >>> curious if we consider it a problem or not? If so, I can try to fix >>> this >>> in a follow-up patch. >> >> Since "P0588R1: Simplifying implicit lambda capture" they are >> captures, >> though [expr.prim.lambda.capture] gives us license to optimize the >> captures if not ODR-used. I couldn't find a test where this patch >> would be an ABI change. > Thanks for the pointer! Yeah, whether we optimise out the captures or > > not remains legit. > > And if you look specifically at ok_3 from the test case (without the > patch, the capture is optimised out, with it it’s not anymore): > - I don’t think it’s an ABI change since we don’t do anything > with captures in mangle.cc. > - Since captures are only relevant within the function calling them, > we don’t have any issue if we link together object files generated > without the patch and object files generated with the patch. > >>> Successfully tested on x86_64-pc-linux-gnu. >>> >>> PR c++/114292 >>> >>> gcc/cp/ChangeLog: >>> >>> * cp-tree.h (mark_const_var_capture_optimized): Declare. >>> * expr.cc (mark_use): Call mark_const_var_capture_optimized. >>> * lambda.cc (mark_const_var_capture_optimized): New. Only set >>> LAMBDA_EXPR_CAPTURE_OPTIMIZED at lambda instantiation time. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * g++.dg/cpp1y/lambda-ice4.C: New test. >>> >>> --- >>> gcc/cp/cp-tree.h | 1 + >>> gcc/cp/expr.cc | 10 ++---- >>> gcc/cp/lambda.cc | 13 +++++++ >>> gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C | 44 >>> ++++++++++++++++++++++++ >>> 4 files changed, 60 insertions(+), 8 deletions(-) >>> create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C >>> >>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h >>> index c5e0fc5c440..ce050032fdb 100644 >>> --- a/gcc/cp/cp-tree.h >>> +++ b/gcc/cp/cp-tree.h >>> @@ -8058,6 +8058,7 @@ extern bool is_constant_capture_proxy >>> (tree); >>> extern void register_capture_members (tree); >>> extern tree lambda_expr_this_capture (tree, int); >>> extern void maybe_generic_this_capture (tree, tree); >>> +extern void mark_const_var_capture_optimized (void); >>> extern tree maybe_resolve_dummy (tree, bool); >>> extern tree current_nonlambda_function (void); >>> extern tree nonlambda_method_basetype (void); >>> diff --git a/gcc/cp/expr.cc b/gcc/cp/expr.cc >>> index de4991e616c..d6a2454c46e 100644 >>> --- a/gcc/cp/expr.cc >>> +++ b/gcc/cp/expr.cc >>> @@ -120,10 +120,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p, > >>> { >>> tree val = RECUR (cap); >>> if (!is_capture_proxy (val)) >>> - { >>> - tree l = current_lambda_expr (); >>> - LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true; >>> - } >>> + mark_const_var_capture_optimized (); >>> return val; >>> } >>> } >>> @@ -171,10 +168,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p, > >>> { >>> tree val = RECUR (cap); >>> if (!is_capture_proxy (val)) >>> - { >>> - tree l = current_lambda_expr (); >>> - LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true; >>> - } >>> + mark_const_var_capture_optimized (); >>> return val; >>> } >>> } >>> diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc >>> index d8a15d97d5d..4fd3b39c99b 100644 >>> --- a/gcc/cp/lambda.cc >>> +++ b/gcc/cp/lambda.cc >>> @@ -945,6 +945,19 @@ resolvable_dummy_lambda (tree object) >>> return NULL_TREE; >>> } >>> >>> +/* Called when optimizing out a capture to a const variable. */ >>> + >>> +void >>> +mark_const_var_capture_optimized () >>> +{ >>> + /* The actual optimizing out only occurs when instantiating the > >>> lambda. */ >>> + if (processing_template_decl) >>> + return; >>> + >>> + tree l = current_lambda_expr (); >>> + LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true; >>> +} >> >> If we do this, perhaps the function can be a static function in >> expr.cc, >> and maybe we can check processing_template_decl before >> is_capture_proxy. > Yeah, good suggestions, thanks! > > That’s what the attached updated patch does - also successfully tested > on x86_64-pc-linux-gnu. Ok for trunk? Ping, thanks! Simon