[PATCH] D153296: [AST] Stop evaluate constant expression if the condition expression which in switch statement contains errors

Haojian Wu via Phabricator via cfe-commits Thu, 06 Jul 2023 01:52:00 -0700

hokein added inline comments.


================
Comment at: clang/lib/AST/ExprConstant.cpp:4921
+  // value is.
+  if (isa<RecoveryExpr>(E))
+    return false;
----------------
aaron.ballman wrote:
> hokein wrote:
> > hokein wrote:
> > > hokein wrote:
> > > > yronglin wrote:
> > > > > yronglin wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > yronglin wrote:
> > > > > > > > hokein wrote:
> > > > > > > > > The constant evaluator is not aware of the "error" concept, 
> > > > > > > > > it is only aware of value-dependent -- the general idea 
> > > > > > > > > behind is that we treat the dependent-on-error and 
> > > > > > > > > dependent-on-template-parameters cases the same, they are 
> > > > > > > > > potentially constant (if we see an expression contains 
> > > > > > > > > errors, it could be constant depending on how the error is 
> > > > > > > > > resolved), this will give us nice recovery and avoid bogus 
> > > > > > > > > following diagnostics.
> > > > > > > > > 
> > > > > > > > > So, a `RecoveryExpr` should not result in failure when 
> > > > > > > > > checking for a potential constant expression.
> > > > > > > > > 
> > > > > > > > > I think the right fix is to remove the conditional check `if 
> > > > > > > > > (!EvaluateDependentExpr(SS->getCond(), Info))` in 
> > > > > > > > > `EvaluateSwitch`, and return `ESR_Failed` unconditionally (we 
> > > > > > > > > don't know its value, any switch-case anwser will be wrong in 
> > > > > > > > > some cases). We already do this for return-statment, 
> > > > > > > > > do-statement etc.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > Do you mean?
> > > > > > > > ```
> > > > > > > > if (SS->getCond()->isValueDependent()) {
> > > > > > > >     EvaluateDependentExpr(SS->getCond(), Info);
> > > > > > > >     return ESR_Failed;
> > > > > > > > }
> > > > > > > > ```
> > > > > > > > the general idea behind is that we treat the dependent-on-error 
> > > > > > > > and dependent-on-template-parameters cases the same, they are 
> > > > > > > > potentially constant (if we see an expression contains errors, 
> > > > > > > > it could be constant depending on how the error is resolved), 
> > > > > > > > this will give us nice recovery and avoid bogus following 
> > > > > > > > diagnostics.
> > > > > > > 
> > > > > > > I could use some further education on why this is the correct 
> > > > > > > approach. For a dependent-on-template-parameters case, this makes 
> > > > > > > sense -- either the template will be instantiated (at which point 
> > > > > > > we'll know if it's a constant expression) or it won't be (at 
> > > > > > > which point it's constant expression-ness doesn't matter). But 
> > > > > > > for error recovery, we will *never* get a valid constant 
> > > > > > > expression.
> > > > > > > 
> > > > > > > I worry about the performance overhead of continuing on with 
> > > > > > > constant expression evaluation in the error case. We use these 
> > > > > > > code paths not only to get a value but to say "is this a constant 
> > > > > > > expression at all?".
> > > > > > > 
> > > > > > > I don't see why the fix should be localized to just the switch 
> > > > > > > statement condition; it seems like *any* attempt to get a 
> > > > > > > dependent value from an error recovery expression is a point at 
> > > > > > > which we can definitively say "this is not a constant expression" 
> > > > > > > and move on.
> > > > > > I understand that continuing to perform constant evaluation when an 
> > > > > > error occurs can bring more additional diagnostic information (such 
> > > > > > as jumping to the default branch to continue calculation when the 
> > > > > > condition expression evaluation of switch-statement fails), but the 
> > > > > > additional diagnostic message that is emitted is in some cases 
> > > > > > doesn't usually useful, and as Aaron said may affect performance of 
> > > > > > clang. I don't have enough experience to make a tradeoff between 
> > > > > > the two. BTW 
> > > > > > https://github.com/llvm/llvm-project/blob/843ff7581408a02e852c0f1f7ebf176cabbc7527/clang/lib/Parse/ParseStmt.cpp#L1894-L1909
> > > > > >  I don't quite understand why a RecoveryExpr is not created here, 
> > > > > > which caused to the whole do statement disappears on the 
> > > > > > AST(https://godbolt.org/z/PsPb31YKP), should we fix this? 
> > > > > Thanks a lot for your comments! @aaron.ballman 
> > > > > But for error recovery, we will *never* get a valid constant 
> > > > > expression.
> > > > 
> > > > Yeah, this is true, and we don't evaluate these error-dependent 
> > > > expressions.
> > > > 
> > > > I think the question here is that when we encounter an error-dependent 
> > > > expression during a constant expression evaluation, do we want to 
> > > > bailout the whole evaluation immediately, or just ignore the evaluation 
> > > > of this error-dependent expression and continue to proceed when 
> > > > possible?
> > > > 
> > > > We choose 2) -- this was based on the discussion with @rsmith. This 
> > > > will likely give us decent error-recovery and useful followup 
> > > > diagnostics.
> > > > 
> > > > Some concrete examples,
> > > > 
> > > > ```
> > > > int abc();
> > > > constexpr int f() { 
> > > >   error(); 
> > > >   // We emit a diagnostic "Constexpr function never produces a constant 
> > > > expression, because abc() cannot be used in a constant expression"
> > > >   return abc(); 
> > > > }
> > > > ```
> > > > 
> > > > ```
> > > > // We prefer to treat the function f as a constexpr function even 
> > > > though it has an error expression. We will preserve more information in 
> > > > the AST, e.g. clangd's hover on the function call `f()` can give you 
> > > > the return value 1.
> > > > 
> > > > constexpr int f() {
> > > >    error();
> > > >    return 1;
> > > > }
> > > > ```
> > > > 
> > > > > I worry about the performance overhead of continuing on with constant 
> > > > > expression evaluation in the error case. We use these code paths not 
> > > > > only to get a value but to say "is this a constant expression at 
> > > > > all?".
> > > > 
> > > > Yeah, performance maybe a valid concern, but I don't have any data. 
> > > > These code paths are also used for generating diagnostics. I think this 
> > > > is a cost we pay for having nice error recovery.  
> > > > If the performance is a real issue here, we probably need to figure out 
> > > > bottlenecks before coming up solutions.
> > > > 
> > > > 
> > > > > I don't see why the fix should be localized to just the switch 
> > > > > statement condition; it seems like *any* attempt to get a dependent 
> > > > > value from an error recovery expression is a point at which we can 
> > > > > definitively say "this is not a constant expression" and move on.
> > > > 
> > > > Right.
> > > > 
> > > > Sorry for not being clear, I didn't mean the fix should be localized to 
> > > > switch-statement condition only, we already do it for the condition for 
> > > > [for](https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L5452),
> > > >  
> > > > [while](https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L5351)
> > > >  etc -- whenever we encounter a value-dependent condition, we bailout 
> > > > unconditionally, the switch-statement seems to be an oversight in 
> > > > https://reviews.llvm.org/D113752 (sorry for not catching it earlier 
> > > > during the review).
> > > > Do you mean?
> > > > 
> > > > if (SS->getCond()->isValueDependent()) {
> > > >     EvaluateDependentExpr(SS->getCond(), Info);
> > > >     return ESR_Failed;
> > > > }
> > > 
> > > Yeah, this is what I meant.
> > > 
> > > BTW 
> > > https://github.com/llvm/llvm-project/blob/843ff7581408a02e852c0f1f7ebf176cabbc7527/clang/lib/Parse/ParseStmt.cpp#L1894-L1909
> > >  I don't quite understand why a RecoveryExpr is not created here, which 
> > > caused to the whole do statement disappears on the 
> > > AST(https://godbolt.org/z/PsPb31YKP), should we fix this?
> > 
> > RecoveryExpr is created in an ad-hoc way, there are cases we might miss. It 
> > would be nice to fix it so that clang can preserve more AST nodes for 
> > broken code. But this is different issue, no need to address it here.
> > 
> > I think we run into the `if (Cond.isUsable())` branch (rather than the 
> > `else` branch), the Cond expression is a `TypoExpr`, and we fail to resolve 
> > the `TypoExpr`, so the whole do-while statement get discarded during the 
> > parsing.
> > 
> > I think we can probably fix it by passing a true `RecoverUncorrectedTypos` 
> > flag in the `CorrectDelayedTyposInExpr` call there -- the TypoExpr will 
> > fallback to RecoveryExpr if clang's TypoCorrection fails to resolve it.  
> > 
> >>But for error recovery, we will *never* get a valid constant expression.
> > Yeah, this is true, and we don't evaluate these error-dependent expressions.
> 
> That's why I think `EvaluateDependentExpr()` should return false when we 
> *are* evaluating these error-dependent expressions. When we get to the point 
> of saying "I need this value", we're saying "I don't know what the value is 
> but keep going just in case" and that's a potentially significant hit on 
> compile time. Consider: https://godbolt.org/z/cGWrY3fa7 -- the slow case 
> takes about twice as long to compute as the fast case and the diagnostic 
> behavior is the same.
> 
> > I think the question here is that when we encounter an error-dependent 
> > expression during a constant expression evaluation, do we want to bailout 
> > the whole evaluation immediately, or just ignore the evaluation of this 
> > error-dependent expression and continue to proceed when possible?
> 
> I guess I fall on the side of "fail quickly so long as the failure is 
> actionable". I don't see a lot of benefit to the follow-on diagnostics 
> because a `RecoveryExpr` is always fatal anyway and I expect it's more 
> frequent for there to be one compile error in a constant expression than it 
> is for there to be several unrelated compile errors, but that's based on 
> intuition and not anything I've measured.
> 
> Constant expression evaluation and template instantiation absolutely murder 
> compile times in C++, so we need to be careful about the overhead tradeoffs 
> between follow-on diagnostics and compile times. For example, if the constant 
> expression is being used in a SFINAE context, the extra follow-on diagnostics 
> are always overhead without benefit because that template won't instantiate 
> anyway (and that could very well be expected behavior for the given template 
> arguments).
> 
> > If the performance is a real issue here, we probably need to figure out 
> > bottlenecks before coming up solutions.
> 
> Two issues: 1) RecoveryExpr keeps showing up in places where it's not 
> expected and we're having to play whack-a-mole with fixing the resulting 
> crashes. 2) Performance concerns. I think #2 is being handled in other ways, 
> but #1 is what really worries me given that it's often not obvious what the 
> right answer is.
> Consider: https://godbolt.org/z/cGWrY3fa7 -- the slow case takes about twice 
> as long to compute as the fast case and the diagnostic behavior is the same.

Thanks for the example, but I'm not sure I'm convinced -- I think the slow part 
is coming from the constexpr function itself -- it has a heavy while-loop in 
the implementation: 

1) for correct code `f(0x7FFF'FFFF'FFFF'FFFF, 1)`, it is slow
2) for error code `f(0x7FFF'FFFF'FFFF'FFFF, 0)`, it is slow

There is no performance difference between 1) and 2), compiling the correct 
code is already slow. And we have the `constexpr_step_limit` during the 
constant evaluation, which gives us a upper-bound compiling time.

And yeah, we can save some cost by skipping the heady loop if we encounter any 
fatal errors in advance, but this only speeds up error cases (with the cost of 
downgrading diagnostics), I guess I don't get the point why the error-code case 
is more special than the correct-code case, and we just do the optimization for 
it.

IMO, for error code, I think diagnostics are important for error-code, it gives 
users information about the wrong places about their code. I'd image users 
spend most of time on reading these diagnostics and fixing their code

> I guess I fall on the side of "fail quickly so long as the failure is 
> actionable".

I have some concerns:
1) downgrading diagnostics -- imaging there are N fatal errors, now we have to 
compile the code N times in order to view&fix them (which will take more time 
in total). Moreover, it doesn't seem to align with what clang does today (clang 
doesn't stop parsing when encountering the first fatal error because of 
performance, it tries the best to recover from the error and emit as many 
useful diagnostics as possible in a single run in order to provide nice 
diagnostics).

2) it only fixes error-dependent crashes -- ideally, the value-dependent 
mechanism in the constant evaluator is designed to handle normal 
template-dependent code and error-dependent code, though the template-dependent 
code part is not implemented yet (see the 
[FIXME](https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L16244-L16248)),
 we will likely encounter similar crashes if we only special-case the 
error-code.

> Two issues: 1) RecoveryExpr keeps showing up in places where it's not 
> expected and we're having to play whack-a-mole with fixing the resulting 
> crashes. 2) Performance concerns. I think #2 is being handled in other ways, 
> but #1 is what really worries me given that it's often not obvious what the 
> right answer is.

For #1, I'm sorry for the issues caused by `RecoveryExpr` (and thanks for all 
the hard work you have done in tracking, reviewing, and fixing these issues).

I think the fundamental reason is we now keep more AST nodes that would 
otherwise be discarded without RecoveryExpr, these AST nodes are similar to 
regular dependent nodes, and have weaker invariants (e.g. we can now have 
value-dependent expressions outside of dependent contexts), some clang 
codepaths don't handle them well, we have to teach these codepaths.
And I want to emphasize that these fixes are not just ad-hoc fixes, they also 
improve the code health of clang infrastructure in the long run which enables 
technical debt removal etc.

It is frustrating to see these crashes occur from time to time, I will like to 
improve the situation here, I don't have a better solution to find out all of 
them, except fixing them case-by-case (internally we have daily crash clangd 
report, we keep monitoring it and fixing highly-frequent crashes). 

If you see any related issues in the future, feel free to ping me as I might 
miss them, I'm happy to fix them or provide any help.

For 2), it seems to be a known issue of the current constant evaluator, there 
are other attempts (e.g. 
https://www.redhat.com/en/blog/new-constant-expression-interpreter-clang) to 
address the performance issue .


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153296/new/

https://reviews.llvm.org/D153296

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153296: [AST] Stop evaluate constant expression if the condition expression which in switch statement contains errors

Reply via email to