Hi,

On 13 Dec 2024, at 13:50, Simon Martin wrote:

> Hi Marek,
>
> On 13 Dec 2024, at 0:44, Marek Polacek wrote:
>
>> On Thu, Dec 12, 2024 at 07:07:38PM +0000, Simon Martin wrote:
>>> We currently ICE upon the following valid (under -Wno-vla) code
>>>
>>> === cut here ===
>>> void f(int c) {
>>>   constexpr int r = 4;
>>>   [&](auto) { int t[r * c]; }(0);
>>> }
>>> === cut here ===
>>>
>>> The problem is that when parsing the lambda body, and more
>>> specifically
>>> the multiplication, we mark the lambda as
>>> LAMBDA_EXPR_CAPTURE_OPTIMIZED
>>> even though the replacement of r by 4 is "undone" by the call to
>>> build_min_non_dep in build_x_binary_op. This makes
>>> prune_lambda_captures
>>
>> Ah yeah, because build_min_non_dep gets the original operands.
>>
>>> remove the proxy declaration while it should not, and we trip on an
>>> assert at instantiation time.
>>>
>>> This patch fixes the ICE by making sure that lambdas are only marked
>>> as
>>> LAMBDA_EXPR_CAPTURE_OPTIMIZED when they're instantiated (I tried
>>> other
>>> strategies like not undoing constant folding in build_min_non_dep,
>
>>> but
>>> it is pretty intrusive and breaks lots of things).
>>
>> I've tried that too and I also ran into a number of issues.  I also
>
>> tried
>> checking p_t_d in prune_lambda_capture since it already says "Don't
>
>> bother
>> pruning in a template" but that doesn't work, either.
> Yeah, the fundamental “problem” is that for lambdas that are not
> within a template, we generate the closure before instantiating the
> lambda function, so prune_lambda_capture thinks (without my patch) that
> captures to constants have been folded out, which might be the case
> (e.g. with ok_3 in the test I added) or not (e.g. in this PR) depending
> on whether they’re part of an expression for which build_min_non_dep
> has been called.
>
>>> The test I added also shows that we don't always optimize out
>>> captures
>>> to constants for lambdas that are not within a template (see ok_2 for
>
>>> example, or ok_3 that unlike ok_2 "regresses" a bit with my patch) -
>>> I'm
>>> curious if we consider it a problem or not? If so, I can try to fix
>>> this
>>> in a follow-up patch.
>>
>> Since "P0588R1: Simplifying implicit lambda capture" they are
>> captures,
>> though [expr.prim.lambda.capture] gives us license to optimize the
>> captures if not ODR-used.  I couldn't find a test where this patch
>> would be an ABI change.
> Thanks for the pointer! Yeah, whether we optimise out the captures or
>
> not remains legit.
>
> And if you look specifically at ok_3 from the test case (without the
> patch, the capture is optimised out, with it it’s not anymore):
>   - I don’t think it’s an ABI change since we don’t do anything
> with captures in mangle.cc.
>   - Since captures are only relevant within the function calling them,
> we don’t have any issue if we link together object files generated
> without the patch and object files generated with the patch.
>
>>> Successfully tested on x86_64-pc-linux-gnu.
>>>
>>>     PR c++/114292
>>>
>>> gcc/cp/ChangeLog:
>>>
>>>     * cp-tree.h (mark_const_var_capture_optimized): Declare.
>>>     * expr.cc (mark_use): Call mark_const_var_capture_optimized.
>>>     * lambda.cc (mark_const_var_capture_optimized): New. Only set
>>>     LAMBDA_EXPR_CAPTURE_OPTIMIZED at lambda instantiation time.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     * g++.dg/cpp1y/lambda-ice4.C: New test.
>>>
>>> ---
>>>  gcc/cp/cp-tree.h                         |  1 +
>>>  gcc/cp/expr.cc                           | 10 ++----
>>>  gcc/cp/lambda.cc                         | 13 +++++++
>>>  gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C | 44
>>> ++++++++++++++++++++++++
>>>  4 files changed, 60 insertions(+), 8 deletions(-)
>>>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-ice4.C
>>>
>>> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
>>> index c5e0fc5c440..ce050032fdb 100644
>>> --- a/gcc/cp/cp-tree.h
>>> +++ b/gcc/cp/cp-tree.h
>>> @@ -8058,6 +8058,7 @@ extern bool is_constant_capture_proxy
>>> (tree);
>>>  extern void register_capture_members               (tree);
>>>  extern tree lambda_expr_this_capture            (tree, int);
>>>  extern void maybe_generic_this_capture             (tree, tree);
>>> +extern void mark_const_var_capture_optimized       (void);
>>>  extern tree maybe_resolve_dummy                    (tree, bool);
>>>  extern tree current_nonlambda_function             (void);
>>>  extern tree nonlambda_method_basetype              (void);
>>> diff --git a/gcc/cp/expr.cc b/gcc/cp/expr.cc
>>> index de4991e616c..d6a2454c46e 100644
>>> --- a/gcc/cp/expr.cc
>>> +++ b/gcc/cp/expr.cc
>>> @@ -120,10 +120,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>
>>>         {
>>>           tree val = RECUR (cap);
>>>           if (!is_capture_proxy (val))
>>> -           {
>>> -             tree l = current_lambda_expr ();
>>> -             LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>>> -           }
>>> +           mark_const_var_capture_optimized ();
>>>           return val;
>>>         }
>>>     }
>>> @@ -171,10 +168,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>
>>>             {
>>>               tree val = RECUR (cap);
>>>               if (!is_capture_proxy (val))
>>> -               {
>>> -                 tree l = current_lambda_expr ();
>>> -                 LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>>> -               }
>>> +               mark_const_var_capture_optimized ();
>>>               return val;
>>>             }
>>>         }
>>> diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
>>> index d8a15d97d5d..4fd3b39c99b 100644
>>> --- a/gcc/cp/lambda.cc
>>> +++ b/gcc/cp/lambda.cc
>>> @@ -945,6 +945,19 @@ resolvable_dummy_lambda (tree object)
>>>    return NULL_TREE;
>>>  }
>>>
>>> +/* Called when optimizing out a capture to a const variable.  */
>>> +
>>> +void
>>> +mark_const_var_capture_optimized ()
>>> +{
>>> +  /* The actual optimizing out only occurs when instantiating the
>
>>> lambda.  */
>>> +  if (processing_template_decl)
>>> +    return;
>>> +
>>> +  tree l = current_lambda_expr ();
>>> +  LAMBDA_EXPR_CAPTURE_OPTIMIZED (l) = true;
>>> +}
>>
>> If we do this, perhaps the function can be a static function in
>> expr.cc,
>> and maybe we can check processing_template_decl before
>> is_capture_proxy.
> Yeah, good suggestions, thanks!
>
> That’s what the attached updated patch does - also successfully tested
> on x86_64-pc-linux-gnu. Ok for trunk?
Ping, thanks!

Simon

Reply via email to