[PATCH] D137381: [clang][compiler-rt] Exception escape out of an non-unwinding function is an undefined behaviour

Roman Lebedev via Phabricator via cfe-commits Mon, 28 Nov 2022 17:53:30 -0800

lebedev.ri added inline comments.


================
Comment at: clang/lib/CodeGen/CGCall.cpp:5402-5411
+    if (SanOpts.has(SanitizerKind::ExceptionEscape) &&
+        ExceptionEscapeUBLastInvokeSrcLoc) {
+      llvm::Constant *CheckSourceLocation = EmitCheckSourceLocation(Loc);
+      Builder.CreateStore(
+          CheckSourceLocation,
+          Address(ExceptionEscapeUBLastInvokeSrcLoc,
+                  CheckSourceLocation->getType(),
----------------
lebedev.ri wrote:
> rjmccall wrote:
> > lebedev.ri wrote:
> > > lebedev.ri wrote:
> > > > @rjmccall
> > > > So there are two issues:
> > > > 1. `getInvokeDest()` isn't necessarily called just before creating an 
> > > > `invoke`, see e.g. `-EHa` handling in 
> > > > `CodeGenFunction::PopCleanupBlock()`
> > > > 2. Even if we ignore that, we need to do this for *every* `invoke`, not 
> > > > just those going to the our UB landing pad, consider: 
> > > > https://godbolt.org/z/qTeKor41a <- the invoke leads to a normal landing 
> > > > pad, yet we immediately rethrow the just-caught exception, and now end 
> > > > up in the UB landing pad.
> > > > 
> > > > So i'm not really seeing an alternative path here?
> > > @rjmccall do you agree with my reasoning here?
> > > Perhaps there is some other solution that i'm not seeing,
> > > other than not having the source location?
> > I think there are four interesting questions posed by your example.
> > 
> > The first is whether we should make an effort to report the original source 
> > location for the call to `maybe_throws` instead of the rethrow location.  
> > To me, the answer is no, and the main reason is that the user is explicitly 
> > acknowledging the possibility of an exception and is (probably) trying to 
> > handle it.  That means that the proximate source of programmer error is now 
> > that their attempt to handle it failed, not that an exception was raised in 
> > the first place.  That might seem weird for your example specifically, 
> > 
> > The second is what source location we should report for invokes that don't 
> > have an obvious source location.  The answer is that I think we should 
> > always try to have a source location, even if it might be an imperfect one. 
> >  Synthesized code almost always has a source location that triggered the 
> > synthesis.  Cleanups are typically associated with some variable or 
> > temporary that itself should have a source location.  Even things like 
> > `__cxa_end_catch` are associated with the `catch` clause.  Hopefully UBSan  
> > has a flexible enough schema in how source locations are expressed to it 
> > for us to communicate these things down to the runtime, like "synthesized 
> > code at Foo.cpp:18" or "cleanup for variable at Bar.cpp:107".
> > 
> > The third is the more general question of whether we need to store a source 
> > location before every `invoke`.  To ensure that this variable is 
> > meaningfully initialized, we need to store a source location before every 
> > invoke that can lead to a UB scope.  But the variable is stateful, so it's 
> > equally important that we not emit stores that might overwrite the source 
> > location that we actually want to report.  For example, we want the source 
> > location reported in your example to be the source location for the 
> > `throw;`, not something associated with the `__cxa_end_catch` cleanup.  
> > This basically boils down to not doing stores within a landing pad before 
> > it enters a `catch` handler.  Fortunately, landing pad code like that 
> > should always be in a terminate scope, so I think you can just look at the 
> > current EH stack and see whether the landing pad can possibly reach a UB 
> > scope.  (You might need some save-and-restore logic around `@finally` 
> > blocks in ObjC.)
> > 
> > The final question is whether we need to emit cleanups on paths that lead 
> > to UB scopes at all.  In the simplest cases, I think we don't.  The 
> > standard says that it's implementation-specified whether the stack is 
> > unwound on an exception path that leads to termination.  That rule doesn't 
> > technically apply here, but only because we're dealing with a language 
> > extension that makes unwinding UB in the first place.  Since we have room 
> > to choose the semantics, the standard gives us pretty strong cover to be 
> > just as aggressive about skipping cleanups in contexts where throwing is UB 
> > as it is in contexts where throwing terminates.  That, in turn, means that 
> > the landing pad in these contexts will usually just be a UB scope with no 
> > cleanups in it.  So even if we did need to worry about cleanups overwriting 
> > the source location, it usually won't be possible because we won't be 
> > running cleanups.  In fact, if there's a catch-all or a terminate scope, 
> > the UB scope won't be reachable, so the only situation where we have to 
> > worry about cleanups being run before we reach a UB scope is if there are 
> > cleanups inside a *partial* catch clause, like `catch (const std::exception 
> > &e)`.  And in principle we have the information up front from the EH 
> > selector to know whether we're going to end up in a catch handler, so we 
> > could actually just check immediately at the start of the landing pad and 
> > then trigger UBSan.
> > The final question is whether we need to emit cleanups on paths that lead 
> > to UB scopes at all. In the simplest cases, I think we don't. The standard 
> > says that it's implementation-specified whether the stack is unwound on an 
> > exception path that leads to termination. That rule doesn't technically 
> > apply here, but only because we're dealing with a language extension that 
> > makes unwinding UB in the first place. Since we have room to choose the 
> > semantics, the standard gives us pretty strong cover to be just as 
> > aggressive about skipping cleanups in contexts where throwing is UB as it 
> > is in contexts where throwing terminates. That, in turn, means that the 
> > landing pad in these contexts will usually just be a UB scope with no 
> > cleanups in it. So even if we did need to worry about cleanups overwriting 
> > the source location, it usually won't be possible because we won't be 
> > running cleanups. In fact, if there's a catch-all or a terminate scope, the 
> > UB scope won't be reachable, so the only situation where we have to worry 
> > about cleanups being run before we reach a UB scope is if there are 
> > cleanups inside a *partial* catch clause, like catch (const std::exception 
> > &e). And in principle we have the information up front from the EH selector 
> > to know whether we're going to end up in a catch handler, so we could 
> > actually just check immediately at the start of the landing pad and then 
> > trigger UBSan.
> 
> The more i stare at this. The more worried i am.
> Is the idea that, when we are in termination/UB EH scope,
> we no longer have to play by the RAII rules,
> and destructors, for the moment, are no-ops and are side-effect free?
> 
> Consider:
> ```
> #include <cstdlib>
> #include <cstdio>
> 
> void will_throw() { throw int(42); }
> void might_throw() { }
> 
> struct abort_on_dtor {
>   abort_on_dtor() {}
>   [[noreturn]] ~abort_on_dtor() { printf("in ~abort_on_dtor()\n"); abort(); }
> };
> 
> void test0() noexcept {
>   abort_on_dtor scope;
>   will_throw();
>   might_throw();
> }
> 
> __attribute__ ((pure)) void test1() {
>   abort_on_dtor scope;
>   will_throw();
>   might_throw();
> }
> 
> int main(int argc, char* argv[]) {
>   printf("argc = %i\n", argc);
>   if(argc == 1)
>     abort_on_dtor scope;
>   if(argc == 2)
>     test0();
>   if(argc == 3)
>     test1();
>   return 0;
> }
> ```
> You'd think, if call to `test0()`/`test1()` unwinds,
> we'd first destruct `abort_on_dtor`, and termination/UB no longer happens.
> But i guess termination/UB time-traveled/"happened" the moment 
> `test0()`/`test1()` unwinds?
> 
> The final question is whether we need to emit cleanups on paths that lead to 
> UB scopes at all. In the simplest cases

I believe, i have implemented these semantics, i.e.
* no cleanups in UB EH scope, just like how we deal with Terminate scope
* when in UB EH scope, and not sanitizing, all callees are implicitly nounwind
* If `catch()` blocks don't catch the exception, and we get to UB scope, we go 
into `unreachable` block.

Did i understand you correctly?
(test suite does not pass, there are a few bugs to deal with)



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137381/new/

https://reviews.llvm.org/D137381

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D137381: [clang][compiler-rt] Exception escape out of an non-unwinding function is an undefined behaviour

Reply via email to