[PATCH] D80366: [Analyzer] Add `getReturnValueUnderConstruction()` to `CallEvent`

Balogh , Ádám via Phabricator via cfe-commits Wed, 03 Jun 2020 07:09:02 -0700

baloghadamsoftware marked 4 inline comments as done.
baloghadamsoftware added inline comments.



================
Comment at: clang/lib/StaticAnalyzer/Core/CallEvent.cpp:543
+
+  const StackFrameContext *StackFrame = Call.getCalleeStackFrame(BlockCount);
+  if (!StackFrame)
----------------
NoQ wrote:
> It is clear that the final return value of `getConstructionContext()` does 
> not depend on the current block count; in fact the information you're trying 
> to obtain is purely syntactic. You probably might as well pass 0 here. As i 
> mentioned before, the ideal solution would be to make `CallEvent` carry a 
> `CFGElementRef` to begin with (as a replacement for the origin expr), so that 
> it didn't need to be recovered like this.
> 
> We should absolutely eliminate the `BlockCount` parameter of 
> `getReturnValueUnderConstruction()` as well. Checker developers shouldn't 
> understand what it is, especially given that it doesn't matter at all which 
> value do you pass.
OK, I eliminated it. Do we need `BlockCount` for `getParameterLocation()` (the 
other patch)?


================
Comment at: clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp:112
 
+Optional<SVal> ExprEngine::retrieveFromConstructionContext(
+    ProgramStateRef State, const LocationContext *LCtx,
----------------
NoQ wrote:
> baloghadamsoftware wrote:
> > martong wrote:
> > > baloghadamsoftware wrote:
> > > > NoQ wrote:
> > > > > balazske wrote:
> > > > > > baloghadamsoftware wrote:
> > > > > > > baloghadamsoftware wrote:
> > > > > > > > baloghadamsoftware wrote:
> > > > > > > > > baloghadamsoftware wrote:
> > > > > > > > > > NoQ wrote:
> > > > > > > > > > > baloghadamsoftware wrote:
> > > > > > > > > > > > NoQ wrote:
> > > > > > > > > > > > > Please instead re-use the code that computes the 
> > > > > > > > > > > > > object under construction. That'll save you ~50 lines 
> > > > > > > > > > > > > of code and will be more future-proof (eg., 
> > > > > > > > > > > > > standalone temporaries without destructor technically 
> > > > > > > > > > > > > have a construction context with 0 items so when we 
> > > > > > > > > > > > > implement them correctly your procedure will stop 
> > > > > > > > > > > > > working).
> > > > > > > > > > > > That was so my first thought. However, 
> > > > > > > > > > > > `handleConstructionContext()` is private and 
> > > > > > > > > > > > non-static. Now I tried to merge the two methods: if 
> > > > > > > > > > > > the value is already in the construction context, we 
> > > > > > > > > > > > return it, if not then we add it. Is this what you 
> > > > > > > > > > > > suggest? Or did I misunderstand you? At the very 
> > > > > > > > > > > > beginning I tried to simply use 
> > > > > > > > > > > > `handleConstructionContext()`, but it asserted because 
> > > > > > > > > > > > the value was already in the map.
> > > > > > > > > > > I'd like to preserve the assertion that 
> > > > > > > > > > > objects-under-construction are never filled twice; it's a 
> > > > > > > > > > > very useful sanity check. What you need in your checker 
> > > > > > > > > > > is a function that computes object-under-construction but 
> > > > > > > > > > > doesn't put it into the objects-under-construction map. 
> > > > > > > > > > > So you have to separate the computation from filling in 
> > > > > > > > > > > the state.
> > > > > > > > > > OK, so I (fortunately) misundertood you. Thus I should 
> > > > > > > > > > refactor this function to a calculation and a storing part?
> > > > > > > > > OK, I see what you are speaking about, but I have no idea how 
> > > > > > > > > to do it properly. The problem is that the control logic of 
> > > > > > > > > filling in the state also depends on the kind of the 
> > > > > > > > > construction context. For some kinds we do not fill at all. 
> > > > > > > > > Every way I try it becomes more complex and less correct:
> > > > > > > > > 
> > > > > > > > > 1) `NewAllocatedObjectKind`: we do not add this to the state, 
> > > > > > > > > we only retrieve the original.
> > > > > > > > > 2) `SimpleReturnedValueKind` and 
> > > > > > > > > `CXX17ElidedCopyReturnedValueKind`: depending on whether we 
> > > > > > > > > are in top frame we handle this case recursively or we do not 
> > > > > > > > > fill at all, just return the value. What is the construction 
> > > > > > > > > context item here? Maybe the `ReturnStmt`?
> > > > > > > > > 3) `ElidedTemporaryObjectKind`: this is the most problematic: 
> > > > > > > > > we first handle it recursively for the construction context 
> > > > > > > > > after elision, then we also fill it for the elided temporary 
> > > > > > > > > object construction context as well.
> > > > > > > > > 
> > > > > > > > > The only thing I can (maybe) do is to retrieve the 
> > > > > > > > > construction context item. But then the switch is still 
> > > > > > > > > duplicated for filling, because of the different control 
> > > > > > > > > logic for different construction context kinds.
> > > > > > > > > The only thing I can (maybe) do is to retrieve the 
> > > > > > > > > construction context item.
> > > > > > > > 
> > > > > > > > This does not help even for retrieving the value, because its 
> > > > > > > > control logic also depends on the construction context kind. If 
> > > > > > > > I do it, it will be code triplication instead of duplication 
> > > > > > > > and results in a code that is worse to understand than the 
> > > > > > > > current one.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Please instead re-use the code that computes the object under 
> > > > > > > > construction. That'll save you ~50 lines of code and will be 
> > > > > > > > more future-proof (eg., standalone temporaries without 
> > > > > > > > destructor technically have a construction context with 0 items 
> > > > > > > > so when we implement them correctly your procedure will stop 
> > > > > > > > working).
> > > > > > > 
> > > > > > > Any solution I can come up with rather adds 100 to 150 lines to 
> > > > > > > the code instead of saving 50. For retrieving we only have to 
> > > > > > > determine the construction context item (the key). For storing we 
> > > > > > > also have to calculate the value. It sounds good to calculate 
> > > > > > > these things in separate functions and then have a simple 
> > > > > > > retriever and store function but there are lots of recursions, 
> > > > > > > double strores, non-stores, retrieving in the store function that 
> > > > > > > make it too complicated.
> > > > > > > 
> > > > > > > Also retrieving is more complex than just determining the item 
> > > > > > > and getting the value from the context: if you look at 
> > > > > > > `SimpleReturnedValueKind` and `CXX17ElidedCopyReturnedValueKind` 
> > > > > > > you can see that we do not use the construction context we got in 
> > > > > > > the parameter (the construction context of the return value) but 
> > > > > > > the construction context of the call instead. For 
> > > > > > > `ElidedTemporaryObjectKind` we take the construction context 
> > > > > > > before the elusion.
> > > > > > > 
> > > > > > > Future proofness: I agree, this is a problem to solve. In cases 
> > > > > > > where the construction context is empty maybe we can move the 
> > > > > > > calculation of the values to a separate function that is called 
> > > > > > > by both the "handler" and the "retriever".
> > > > > > Do I think correctly that `retrieveFromConstructionContext` (in the 
> > > > > > previous version) should return the same value as second part of 
> > > > > > `handleConstructionContext`? It does not look obvious at first. Or 
> > > > > > do we need a function with that purpose? If yes it looks a 
> > > > > > difficult task because the similar "logic" to get the value and 
> > > > > > update the state. Probably it is enough to add a flag to 
> > > > > > `handleConstructionContext` to not make new state. The current code 
> > > > > > looks bad because calling a "handle" type of function (that was 
> > > > > > private before) only to compute a value.
> > > > > > 
> > > > > > Any solution I can come up with rather adds 100 to 150 lines to the 
> > > > > > code instead of saving 50.
> > > > > 
> > > > > I think the following is a fairly literal implementation of my 
> > > > > suggestion:
> > > > > 
> > > > > {F12020285}
> > > > > 
> > > > > It's 26 lines shorter than your `retrieveFromConstructionContext()` 
> > > > > (ok, not 50, duplicating the switch was more annoying than i 
> > > > > expected), it adds zero new logic (the only new piece of logic is to 
> > > > > track constructors that were modeled correctly but their elision 
> > > > > wasn't; previously we could fit it into control flow), it no longer 
> > > > > tries to reverse-engineer the original behavior by duplicating its 
> > > > > logic, and it's more future-proof for the reasons explained above.
> > > > Thank you for working on this.  The point what I did not see (and I 
> > > > still no not see it) is that in your solution (and your proposal) 
> > > > instead of retrieving the value from the construction context we 
> > > > actually recalculate it, thus we do not use the construction context at 
> > > > all (except for `NewAllocatedObjectKind`) to retrieve the location of 
> > > > the return value. Can we guarantee that we always recalculate exactly 
> > > > the same symbolic value as the symbolic values we store in the map? For 
> > > > example, for `SimpleReturnedValueKind` et. al. we conjure a new 
> > > > symbolic value. Is it really the same value than that the one in the 
> > > > map, thus the one which is the real location where the object is 
> > > > constructed? If not, then the checkers might not work correctly because 
> > > > they store into the GDM using a different region as key than they use 
> > > > for retrieving the value.
> > > > I think the following is a fairly literal implementation of my 
> > > > suggestion: ... It's 26 lines shorter than your 
> > > > retrieveFromConstructionContext() (ok, not 50, duplicating the switch 
> > > > was more annoying than i expected), it adds zero new logic (the only 
> > > > new piece of logic is to track constructors that were modeled correctly 
> > > > but their elision wasn't; previously we could fit it into control 
> > > > flow), it no longer tries to reverse-engineer the original behavior by 
> > > > duplicating its logic, and it's more future-proof for the reasons 
> > > > explained above.
> > > 
> > > +1 for this version. Even though Adam has valid concerns, this version 
> > > contains functions with clear responsibilities.
> > > 
> > > > Can we guarantee that we always recalculate exactly the same symbolic 
> > > > value as the symbolic values we store in the map? For example, for 
> > > > SimpleReturnedValueKind et. al. we conjure a new symbolic value. Is it 
> > > > really the same value than that the one in the map, thus the one which 
> > > > is the real location where the object is constructed?
> > > 
> > > I think the guarantee is that in Artem's implementation the switch is 
> > > duplicated. That would be okay for now, but I still consider this 
> > > error-prone, exactly because of the duplication. So, maybe this yields 
> > > for a common Visitor.
> > > 
> > > Guys, would it be possible to have a common Visitor for these? Could be 
> > > somewhat similar to `DeclVisitor` with return type as type parameter. And 
> > > we could pass a lambda that is called in every `case`? Or would that  be 
> > > too overkill
> > > 
> > > I think the guarantee is that in Artem's implementation the switch is 
> > > duplicated.
> > 
> > That is not my point. My point is that `handleConstructionContext()` is 
> > called, a value is calculated and stored in the map. This calculation may 
> > be the creation of a new conjured symbol. Thereafter, whenever a checker 
> > tries to retrieve the location of the return value from the construction 
> > context in Artem's solution we do not retrieve the value previously stored 
> > in the map, but calculate a new one, which is not necessarily the same as 
> > the one we stored in the map, because a in some cases a new symbol is 
> > conjured. Then the checker uses this new symbol as the index in GDM, but 
> > the object is constructed to the location stored in the map. Therefore, 
> > later when the checker tries to retrieve data from the GDM using the 
> > location of the existing object as a key it cannot find the data because it 
> > was stored using the wrong key.
> > 
> > `getReturnValueUnderConstruction()` must always return the location where 
> > the object is to be constructed. Therefore I think it must not recalculate 
> > the location which may involve creation of new conjured symbols but first 
> > it must look into the map of objects under construction. This guarantees 
> > that the returned location is really the location where the object is 
> > constructed.
> Yes it should always return the same value, because there is exactly one 
> correct solution to the problem that it's solving. If it may return different 
> values for the same `(E, State, LCtx, CC)` tuple, it's a bug. This method 
> should ultimately be turned into a static method.
> 
> Loading a stored value if it exists will only sweep the problem under the 
> rug. It'd be much better for us to //assert// that the freshly recomputed 
> value is equal to the stored value if the latter is present.
> 
> The code that constructs symbolic region for the return value of a top frame 
> is indeed suspicious. I'm not sure it can actually return different regions 
> twice but this code is clearly incorrect because `SymbolConjured` is never 
> the correct solution anyway. The only reason `SymbolConjured` is used 
> anywhere is because the author (in this case, i) was too lazy to define a new 
> symbol class that has the right identity. `SymbolConjured` is like 
> `UnknownVal` that's a bit less unknown but still screams "not implemented 
> yet". I think this definitely deserves a fixme.
> 
> > I think the guarantee is that in Artem's implementation the switch is 
> > duplicated. That would be okay for now, but I still consider this 
> > error-prone, exactly because of the duplication. So, maybe this yields for 
> > a common Visitor.
> > Guys, would it be possible to have a common Visitor for these?
> 
> Mmm, i don't think i understand this argument. A visitor //is// a switch with 
> a fancy interface and wildcards and lacking checks for unhandled cases.
> 
> Generally though i wouldn't mind having a `ConstructionContextVisitor`. For 
> instance, that'll allow us to replace code like
> ```lang=c++
>     case ConstructionContext::CXX17ElidedCopyVariableKind:
>     case ConstructionContext::SimpleVariableKind: {
>     }
> ```
> with a single
> ```lang=c++
>     VisitVariable(VariableConstructionContext *CC) {
>     }
> ```
> 
> I wouldn't like a visitor for `ConstructionContextItem`s because they are 
> fairly meaningless when taken out of, well, //context//.
A `Visitor` is a good idea. However, is it used anywhere else? If it is, then 
it is worth to do it, but I think it can go into a new patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80366/new/

https://reviews.llvm.org/D80366



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D80366: [Analyzer] Add `getReturnValueUnderConstruction()` to `CallEvent`

Reply via email to