lebedev.ri added a subscriber: tentzen.
lebedev.ri added inline comments.

================
Comment at: clang/lib/CodeGen/CGCall.cpp:5402-5411
+    if (SanOpts.has(SanitizerKind::ExceptionEscape) &&
+        ExceptionEscapeUBLastInvokeSrcLoc) {
+      llvm::Constant *CheckSourceLocation = EmitCheckSourceLocation(Loc);
+      Builder.CreateStore(
+          CheckSourceLocation,
+          Address(ExceptionEscapeUBLastInvokeSrcLoc,
+                  CheckSourceLocation->getType(),
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > rjmccall wrote:
> > > lebedev.ri wrote:
> > > > lebedev.ri wrote:
> > > > > @rjmccall
> > > > > So there are two issues:
> > > > > 1. `getInvokeDest()` isn't necessarily called just before creating an 
> > > > > `invoke`, see e.g. `-EHa` handling in 
> > > > > `CodeGenFunction::PopCleanupBlock()`
> > > > > 2. Even if we ignore that, we need to do this for *every* `invoke`, 
> > > > > not just those going to the our UB landing pad, consider: 
> > > > > https://godbolt.org/z/qTeKor41a <- the invoke leads to a normal 
> > > > > landing pad, yet we immediately rethrow the just-caught exception, 
> > > > > and now end up in the UB landing pad.
> > > > > 
> > > > > So i'm not really seeing an alternative path here?
> > > > @rjmccall do you agree with my reasoning here?
> > > > Perhaps there is some other solution that i'm not seeing,
> > > > other than not having the source location?
> > > I think there are four interesting questions posed by your example.
> > > 
> > > The first is whether we should make an effort to report the original 
> > > source location for the call to `maybe_throws` instead of the rethrow 
> > > location.  To me, the answer is no, and the main reason is that the user 
> > > is explicitly acknowledging the possibility of an exception and is 
> > > (probably) trying to handle it.  That means that the proximate source of 
> > > programmer error is now that their attempt to handle it failed, not that 
> > > an exception was raised in the first place.  That might seem weird for 
> > > your example specifically, 
> > > 
> > > The second is what source location we should report for invokes that 
> > > don't have an obvious source location.  The answer is that I think we 
> > > should always try to have a source location, even if it might be an 
> > > imperfect one.  Synthesized code almost always has a source location that 
> > > triggered the synthesis.  Cleanups are typically associated with some 
> > > variable or temporary that itself should have a source location.  Even 
> > > things like `__cxa_end_catch` are associated with the `catch` clause.  
> > > Hopefully UBSan  has a flexible enough schema in how source locations are 
> > > expressed to it for us to communicate these things down to the runtime, 
> > > like "synthesized code at Foo.cpp:18" or "cleanup for variable at 
> > > Bar.cpp:107".
> > > 
> > > The third is the more general question of whether we need to store a 
> > > source location before every `invoke`.  To ensure that this variable is 
> > > meaningfully initialized, we need to store a source location before every 
> > > invoke that can lead to a UB scope.  But the variable is stateful, so 
> > > it's equally important that we not emit stores that might overwrite the 
> > > source location that we actually want to report.  For example, we want 
> > > the source location reported in your example to be the source location 
> > > for the `throw;`, not something associated with the `__cxa_end_catch` 
> > > cleanup.  This basically boils down to not doing stores within a landing 
> > > pad before it enters a `catch` handler.  Fortunately, landing pad code 
> > > like that should always be in a terminate scope, so I think you can just 
> > > look at the current EH stack and see whether the landing pad can possibly 
> > > reach a UB scope.  (You might need some save-and-restore logic around 
> > > `@finally` blocks in ObjC.)
> > > 
> > > The final question is whether we need to emit cleanups on paths that lead 
> > > to UB scopes at all.  In the simplest cases, I think we don't.  The 
> > > standard says that it's implementation-specified whether the stack is 
> > > unwound on an exception path that leads to termination.  That rule 
> > > doesn't technically apply here, but only because we're dealing with a 
> > > language extension that makes unwinding UB in the first place.  Since we 
> > > have room to choose the semantics, the standard gives us pretty strong 
> > > cover to be just as aggressive about skipping cleanups in contexts where 
> > > throwing is UB as it is in contexts where throwing terminates.  That, in 
> > > turn, means that the landing pad in these contexts will usually just be a 
> > > UB scope with no cleanups in it.  So even if we did need to worry about 
> > > cleanups overwriting the source location, it usually won't be possible 
> > > because we won't be running cleanups.  In fact, if there's a catch-all or 
> > > a terminate scope, the UB scope won't be reachable, so the only situation 
> > > where we have to worry about cleanups being run before we reach a UB 
> > > scope is if there are cleanups inside a *partial* catch clause, like 
> > > `catch (const std::exception &e)`.  And in principle we have the 
> > > information up front from the EH selector to know whether we're going to 
> > > end up in a catch handler, so we could actually just check immediately at 
> > > the start of the landing pad and then trigger UBSan.
> > > The final question is whether we need to emit cleanups on paths that lead 
> > > to UB scopes at all. In the simplest cases, I think we don't. The 
> > > standard says that it's implementation-specified whether the stack is 
> > > unwound on an exception path that leads to termination. That rule doesn't 
> > > technically apply here, but only because we're dealing with a language 
> > > extension that makes unwinding UB in the first place. Since we have room 
> > > to choose the semantics, the standard gives us pretty strong cover to be 
> > > just as aggressive about skipping cleanups in contexts where throwing is 
> > > UB as it is in contexts where throwing terminates. That, in turn, means 
> > > that the landing pad in these contexts will usually just be a UB scope 
> > > with no cleanups in it. So even if we did need to worry about cleanups 
> > > overwriting the source location, it usually won't be possible because we 
> > > won't be running cleanups. In fact, if there's a catch-all or a terminate 
> > > scope, the UB scope won't be reachable, so the only situation where we 
> > > have to worry about cleanups being run before we reach a UB scope is if 
> > > there are cleanups inside a *partial* catch clause, like catch (const 
> > > std::exception &e). And in principle we have the information up front 
> > > from the EH selector to know whether we're going to end up in a catch 
> > > handler, so we could actually just check immediately at the start of the 
> > > landing pad and then trigger UBSan.
> > 
> > The more i stare at this. The more worried i am.
> > Is the idea that, when we are in termination/UB EH scope,
> > we no longer have to play by the RAII rules,
> > and destructors, for the moment, are no-ops and are side-effect free?
> > 
> > Consider:
> > ```
> > #include <cstdlib>
> > #include <cstdio>
> > 
> > void will_throw() { throw int(42); }
> > void might_throw() { }
> > 
> > struct abort_on_dtor {
> >   abort_on_dtor() {}
> >   [[noreturn]] ~abort_on_dtor() { printf("in ~abort_on_dtor()\n"); abort(); 
> > }
> > };
> > 
> > void test0() noexcept {
> >   abort_on_dtor scope;
> >   will_throw();
> >   might_throw();
> > }
> > 
> > __attribute__ ((pure)) void test1() {
> >   abort_on_dtor scope;
> >   will_throw();
> >   might_throw();
> > }
> > 
> > int main(int argc, char* argv[]) {
> >   printf("argc = %i\n", argc);
> >   if(argc == 1)
> >     abort_on_dtor scope;
> >   if(argc == 2)
> >     test0();
> >   if(argc == 3)
> >     test1();
> >   return 0;
> > }
> > ```
> > You'd think, if call to `test0()`/`test1()` unwinds,
> > we'd first destruct `abort_on_dtor`, and termination/UB no longer happens.
> > But i guess termination/UB time-traveled/"happened" the moment 
> > `test0()`/`test1()` unwinds?
> > 
> > The final question is whether we need to emit cleanups on paths that lead 
> > to UB scopes at all. In the simplest cases
> 
> I believe, i have implemented these semantics, i.e.
> * no cleanups in UB EH scope, just like how we deal with Terminate scope
> * when in UB EH scope, and not sanitizing, all callees are implicitly nounwind
> * If `catch()` blocks don't catch the exception, and we get to UB scope, we 
> go into `unreachable` block.
> 
> Did i understand you correctly?
> (test suite does not pass, there are a few bugs to deal with)
> 
Ok, trying to continue looking into this

> The first is whether we should make an effort to report the original source 
> location for the call to maybe_throws instead of the rethrow location. To me, 
> the answer is no,

Sure, seems reasonable.

> The second is what source location we should report for invokes that don't 
> have an obvious source location. The answer is that I think we should always 
> try to have a source location, even if it might be an imperfect one. 

It seems like this point is non-actionable. We simply always emit the source 
location,
and if it is invalid, well, then that is a more general problem.

> The third is the more general question of whether we need to store a source 
> location before every invoke. To ensure that this variable is meaningfully 
> initialized, we need to store a source location before every invoke that can 
> lead to a UB scope. But the variable is stateful, so it's equally important 
> that we not emit stores that might overwrite the source location that we 
> actually want to report. For example, we want the source location reported in 
> your example to be the source location for the throw;, not something 
> associated with the __cxa_end_catch cleanup. This basically boils down to not 
> doing stores within a landing pad before it enters a catch handler. 
> Fortunately, landing pad code like that should always be in a terminate 
> scope, so I think you can just look at the current EH stack and see whether 
> the landing pad can possibly reach a UB scope. (You might need some 
> save-and-restore logic around @finally blocks in ObjC.)

I've fixed source location for `throw`, but i'm not quite following the rest of 
the comment here.
Can you perhaps give a self-contained example where we'd end up with wrong line?
I'm not sure about ObjC, so i'll ignore that bit for a moment at least.


================
Comment at: clang/test/CodeGen/windows-seh-EHa-CppCondiTemps.cpp:3
 
+// FIXME: this check appears to be miscompiled?
+// XFAIL: *
----------------
This test broke once we always started adding (outermost) UB scope for nounwind 
functions.
I don't quite get what is going wrong. It could be a bug in SEH handling.
Can someone who has some idea about that code take a look and suggest a fix?
@tentzen ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137381/new/

https://reviews.llvm.org/D137381

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to