[PATCH] D137381: [clang][compiler-rt] Exception escape out of an non-unwinding function is an undefined behaviour

Roman Lebedev via Phabricator via cfe-commits Mon, 28 Nov 2022 17:24:19 -0800

lebedev.ri added a reviewer: efriedma.
lebedev.ri added a comment.

@rjmccall thank you for taking a look!


In D137381#3955100 <https://reviews.llvm.org/D137381#3955100>, @rjmccall wrote:

> In D137381#3953178 <https://reviews.llvm.org/D137381#3953178>, @lebedev.ri 
> wrote:
>
>> ping. We need to get this going, reminder, this was caught because it caused 
>> a visible miscompilation.
>
> I'm sorry, I caught COVID-19 during the LLVM conference and have not had a 
> lot of energy recently,

I'm sorry.

> plus last week was the Thanksgiving holiday in the US.



> Please don't describe this as a miscompilation.  It may not match what you 
> want as a user, but what the the compiler is doing is not incorrect.

Right.

> What you're doing is adding a mitigation to try to protect against a source 
> of UB that recently affected you, which is commendable but doesn't have the 
> same level of urgency.

I'm mainly worried that there are likely other "miscompilations" in the wild.
Also, after this, we might want to reevaluate which clang attribute does what,



================
Comment at: clang/lib/CodeGen/CGCall.cpp:5402-5411
+    if (SanOpts.has(SanitizerKind::ExceptionEscape) &&
+        ExceptionEscapeUBLastInvokeSrcLoc) {
+      llvm::Constant *CheckSourceLocation = EmitCheckSourceLocation(Loc);
+      Builder.CreateStore(
+          CheckSourceLocation,
+          Address(ExceptionEscapeUBLastInvokeSrcLoc,
+                  CheckSourceLocation->getType(),
----------------
rjmccall wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > @rjmccall
> > > So there are two issues:
> > > 1. `getInvokeDest()` isn't necessarily called just before creating an 
> > > `invoke`, see e.g. `-EHa` handling in `CodeGenFunction::PopCleanupBlock()`
> > > 2. Even if we ignore that, we need to do this for *every* `invoke`, not 
> > > just those going to the our UB landing pad, consider: 
> > > https://godbolt.org/z/qTeKor41a <- the invoke leads to a normal landing 
> > > pad, yet we immediately rethrow the just-caught exception, and now end up 
> > > in the UB landing pad.
> > > 
> > > So i'm not really seeing an alternative path here?
> > @rjmccall do you agree with my reasoning here?
> > Perhaps there is some other solution that i'm not seeing,
> > other than not having the source location?
> I think there are four interesting questions posed by your example.
> 
> The first is whether we should make an effort to report the original source 
> location for the call to `maybe_throws` instead of the rethrow location.  To 
> me, the answer is no, and the main reason is that the user is explicitly 
> acknowledging the possibility of an exception and is (probably) trying to 
> handle it.  That means that the proximate source of programmer error is now 
> that their attempt to handle it failed, not that an exception was raised in 
> the first place.  That might seem weird for your example specifically, 
> 
> The second is what source location we should report for invokes that don't 
> have an obvious source location.  The answer is that I think we should always 
> try to have a source location, even if it might be an imperfect one.  
> Synthesized code almost always has a source location that triggered the 
> synthesis.  Cleanups are typically associated with some variable or temporary 
> that itself should have a source location.  Even things like 
> `__cxa_end_catch` are associated with the `catch` clause.  Hopefully UBSan  
> has a flexible enough schema in how source locations are expressed to it for 
> us to communicate these things down to the runtime, like "synthesized code at 
> Foo.cpp:18" or "cleanup for variable at Bar.cpp:107".
> 
> The third is the more general question of whether we need to store a source 
> location before every `invoke`.  To ensure that this variable is meaningfully 
> initialized, we need to store a source location before every invoke that can 
> lead to a UB scope.  But the variable is stateful, so it's equally important 
> that we not emit stores that might overwrite the source location that we 
> actually want to report.  For example, we want the source location reported 
> in your example to be the source location for the `throw;`, not something 
> associated with the `__cxa_end_catch` cleanup.  This basically boils down to 
> not doing stores within a landing pad before it enters a `catch` handler.  
> Fortunately, landing pad code like that should always be in a terminate 
> scope, so I think you can just look at the current EH stack and see whether 
> the landing pad can possibly reach a UB scope.  (You might need some 
> save-and-restore logic around `@finally` blocks in ObjC.)
> 
> The final question is whether we need to emit cleanups on paths that lead to 
> UB scopes at all.  In the simplest cases, I think we don't.  The standard 
> says that it's implementation-specified whether the stack is unwound on an 
> exception path that leads to termination.  That rule doesn't technically 
> apply here, but only because we're dealing with a language extension that 
> makes unwinding UB in the first place.  Since we have room to choose the 
> semantics, the standard gives us pretty strong cover to be just as aggressive 
> about skipping cleanups in contexts where throwing is UB as it is in contexts 
> where throwing terminates.  That, in turn, means that the landing pad in 
> these contexts will usually just be a UB scope with no cleanups in it.  So 
> even if we did need to worry about cleanups overwriting the source location, 
> it usually won't be possible because we won't be running cleanups.  In fact, 
> if there's a catch-all or a terminate scope, the UB scope won't be reachable, 
> so the only situation where we have to worry about cleanups being run before 
> we reach a UB scope is if there are cleanups inside a *partial* catch clause, 
> like `catch (const std::exception &e)`.  And in principle we have the 
> information up front from the EH selector to know whether we're going to end 
> up in a catch handler, so we could actually just check immediately at the 
> start of the landing pad and then trigger UBSan.
> The final question is whether we need to emit cleanups on paths that lead to 
> UB scopes at all. In the simplest cases, I think we don't. The standard says 
> that it's implementation-specified whether the stack is unwound on an 
> exception path that leads to termination. That rule doesn't technically apply 
> here, but only because we're dealing with a language extension that makes 
> unwinding UB in the first place. Since we have room to choose the semantics, 
> the standard gives us pretty strong cover to be just as aggressive about 
> skipping cleanups in contexts where throwing is UB as it is in contexts where 
> throwing terminates. That, in turn, means that the landing pad in these 
> contexts will usually just be a UB scope with no cleanups in it. So even if 
> we did need to worry about cleanups overwriting the source location, it 
> usually won't be possible because we won't be running cleanups. In fact, if 
> there's a catch-all or a terminate scope, the UB scope won't be reachable, so 
> the only situation where we have to worry about cleanups being run before we 
> reach a UB scope is if there are cleanups inside a *partial* catch clause, 
> like catch (const std::exception &e). And in principle we have the 
> information up front from the EH selector to know whether we're going to end 
> up in a catch handler, so we could actually just check immediately at the 
> start of the landing pad and then trigger UBSan.

The more i stare at this. The more worried i am.
Is the idea that, when we are in termination/UB EH scope,
we no longer have to play by the RAII rules,
and destructors, for the moment, are no-ops and are side-effect free?

Consider:
```
#include <cstdlib>
#include <cstdio>

void will_throw() { throw int(42); }
void might_throw() { }

struct abort_on_dtor {
  abort_on_dtor() {}
  [[noreturn]] ~abort_on_dtor() { printf("in ~abort_on_dtor()\n"); abort(); }
};

void test0() noexcept {
  abort_on_dtor scope;
  will_throw();
  might_throw();
}

__attribute__ ((pure)) void test1() {
  abort_on_dtor scope;
  will_throw();
  might_throw();
}

int main(int argc, char* argv[]) {
  printf("argc = %i\n", argc);
  if(argc == 1)
    abort_on_dtor scope;
  if(argc == 2)
    test0();
  if(argc == 3)
    test1();
  return 0;
}
```
You'd think, if call to `test0()`/`test1()` unwinds,
we'd first destruct `abort_on_dtor`, and termination/UB no longer happens.
But i guess termination/UB time-traveled/"happened" the moment 
`test0()`/`test1()` unwinds?



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137381/new/

https://reviews.llvm.org/D137381

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D137381: [clang][compiler-rt] Exception escape out of an non-unwinding function is an undefined behaviour

Reply via email to