haoNoQ wrote:

> Thanks for the info -- I wasn't aware of `ConstructionContext`. Will take a 
> look. Can you point me to some examples of how this is used?

The initial motivation was here: 
https://github.com/llvm/llvm-project/blob/llvmorg-19-init/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp#L130
 - this is the place where the clang static analyzer's path-sensitive engine 
looks at the `CXXConstructExpr` in the CFG and utilizes the additional 
information to foresee the storage in which the prvalue is about to be 
constructed. So that when the constructor call is modeled, we knew what the 
`this` value is. Which, yeah, naturally allows us to map every field of the 
object to the field in the storage, if necessary. Or just to model the value of 
`this` if the code ever tries to use it directly.

Additionally, it allows us to keep track of that storage location for the 
purposes of future events (see the next function at 
https://github.com/llvm/llvm-project/blob/llvmorg-19-init/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp#L408
 where we write the evaluated `MemRegion *` back into our abstract state). This 
is particularly useful for temporary destructors in cases like this:
```
foo(x ? MyClass(1, 2, 3) : MyClass(4, 5, 6));
```
where the CFG looks like a double diamond: we may construct one of the two 
objects, we may destroy one of the two objects, but we merge in the middle in 
order to call `foo()` unconditionally. So you need to figure out at runtime 
which object gets destroyed. And by the time we reach the destructor at the end 
of the full-expression (basically `ExprWithCleanups`) we already forgot which 
of these two objects we needed to destroy. There's no longer any expression 
that evaluates to the location of that object at this point, so if we don't 
keep track of it separately, we don't remember it at all. Or, when `foo()` 
accepts the object as `const MyClass &`, some branches don't even require a 
temporary destructor at all, but the CFG is still the same double diamond:
```
MyClass existing_object(4, 5, 6);
foo(x ? MyClass(1, 2, 3) : existing_object);
```
Thanks to Manuel Klimek's hard work, the magic CFG terminator at the beginning 
of the bottom diamond (which corresponds to the runtime branch responsible for 
conditionally invoking the destructor) remembers a `CXXBindTemporaryExpr *` 
that points to the exact object that we need to destroy. Because this 
`CXXBindTemporaryExpr` is foretold by the `ConstructionContext`, we can write 
it down in the abstract state at construction time and then make the decision 
on the magic terminator based on the contents of the abstract state. (Another 
way to organize this would be to write down this info in the abstract state 
slightly later, when we reach the actual `CXXBindTemporaryExpr`. But in our 
case we'd probably still need to write down the exact `MemRegion`, so it'd 
still require the extra annotations in the CFG to compute that region, so we 
might as well do that at construction where these annotations have already been 
made available available. Especially given that we needed to foretell the 
respective `MaterializeTemporaryExpr` too, and then possibly copy elision on 
top of it, which could be another reason why a temporary destructor may 
suddenly become unnecessary.)

So, yeah, the construction context helps us discover facts about the program 
such as "if `x` is false then this invocation of `foo()` doesn't cause 
`~MyClass()` to be invoked". Which is something you can probably discover in 
simple cases by simply looking at the AST in a top-down manner ("if it never 
constructs the object then it never destroys it, makes sense right?") but if 
you're doing a principled one-step-at-a-time analysis over the CFG then the 
ability to "foretell" the destructor makes your life much easier, simplifies 
the code by a lot.

I made a tiny conference talk about this (the second half of 
https://www.youtube.com/watch?v=4n3l-ZcDJNY&t=692s). The interesting part is 
probably where I enumerate all the ~30 cases I've found, that produce 
substantially different AST, which probably require a different 
`ConstructionContext` subclass for each of them, to properly foretell 
everything we needed it to foretell in each case:
<img width="1052" alt="Screenshot 2024-07-30 at 7 21 27 PM" 
src="https://github.com/user-attachments/assets/4dbf46b9-a6da-45d7-85c9-8235e8c4ded6";>
(A lot of these also produce completely different ASTs before and after C++17.)

I'm very open to changes in `ConstructionContext` to accommodate your needs, 
like provide extra information that we didn't need but you did, or provide it 
in a different way. It's probably not dramatically different from what you're 
doing (i.e. your `RecordStorageLocation` appears to be very similar, albeit 
less abstract), but I suspect that it's slightly more mature at this point, and 
I'm fairly confident in the design:
* `ConstructionContext` looks flexible enough to handle even the most obscure 
cases, assuming that the tool knows how to interpret it;
* I still think that it's a very good idea to separate construction contexts 
into different subclasses because in many cases the tool needs to handle them 
differently even when the information they carry may be the same... in 
particular, a lot of the things wouldn't out work for us if we tried to reduce 
`ConstructionContext` to a single `Stmt *`;
* In particular, I still think that there's a finite amount of 
`ConstructionContext`s in the language: even though you're allowed to put an 
arbitrary amount of `?:` operators between the `CXXConstructExpr` and its 
respective `MaterializeTemporaryExpr`, we're still able cover all of these 
patterns with the same `ConstructionContext` subclass that carries a fixed 
amount of AST pointers.

https://github.com/llvm/llvm-project/pull/99616
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to