baloghadamsoftware marked 4 inline comments as done. baloghadamsoftware added inline comments.
================ Comment at: clang/lib/StaticAnalyzer/Core/CallEvent.cpp:543 + + const StackFrameContext *StackFrame = Call.getCalleeStackFrame(BlockCount); + if (!StackFrame) ---------------- NoQ wrote: > It is clear that the final return value of `getConstructionContext()` does > not depend on the current block count; in fact the information you're trying > to obtain is purely syntactic. You probably might as well pass 0 here. As i > mentioned before, the ideal solution would be to make `CallEvent` carry a > `CFGElementRef` to begin with (as a replacement for the origin expr), so that > it didn't need to be recovered like this. > > We should absolutely eliminate the `BlockCount` parameter of > `getReturnValueUnderConstruction()` as well. Checker developers shouldn't > understand what it is, especially given that it doesn't matter at all which > value do you pass. OK, I eliminated it. Do we need `BlockCount` for `getParameterLocation()` (the other patch)? ================ Comment at: clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp:112 +Optional<SVal> ExprEngine::retrieveFromConstructionContext( + ProgramStateRef State, const LocationContext *LCtx, ---------------- NoQ wrote: > baloghadamsoftware wrote: > > martong wrote: > > > baloghadamsoftware wrote: > > > > NoQ wrote: > > > > > balazske wrote: > > > > > > baloghadamsoftware wrote: > > > > > > > baloghadamsoftware wrote: > > > > > > > > baloghadamsoftware wrote: > > > > > > > > > baloghadamsoftware wrote: > > > > > > > > > > NoQ wrote: > > > > > > > > > > > baloghadamsoftware wrote: > > > > > > > > > > > > NoQ wrote: > > > > > > > > > > > > > Please instead re-use the code that computes the > > > > > > > > > > > > > object under construction. That'll save you ~50 lines > > > > > > > > > > > > > of code and will be more future-proof (eg., > > > > > > > > > > > > > standalone temporaries without destructor technically > > > > > > > > > > > > > have a construction context with 0 items so when we > > > > > > > > > > > > > implement them correctly your procedure will stop > > > > > > > > > > > > > working). > > > > > > > > > > > > That was so my first thought. However, > > > > > > > > > > > > `handleConstructionContext()` is private and > > > > > > > > > > > > non-static. Now I tried to merge the two methods: if > > > > > > > > > > > > the value is already in the construction context, we > > > > > > > > > > > > return it, if not then we add it. Is this what you > > > > > > > > > > > > suggest? Or did I misunderstand you? At the very > > > > > > > > > > > > beginning I tried to simply use > > > > > > > > > > > > `handleConstructionContext()`, but it asserted because > > > > > > > > > > > > the value was already in the map. > > > > > > > > > > > I'd like to preserve the assertion that > > > > > > > > > > > objects-under-construction are never filled twice; it's a > > > > > > > > > > > very useful sanity check. What you need in your checker > > > > > > > > > > > is a function that computes object-under-construction but > > > > > > > > > > > doesn't put it into the objects-under-construction map. > > > > > > > > > > > So you have to separate the computation from filling in > > > > > > > > > > > the state. > > > > > > > > > > OK, so I (fortunately) misundertood you. Thus I should > > > > > > > > > > refactor this function to a calculation and a storing part? > > > > > > > > > OK, I see what you are speaking about, but I have no idea how > > > > > > > > > to do it properly. The problem is that the control logic of > > > > > > > > > filling in the state also depends on the kind of the > > > > > > > > > construction context. For some kinds we do not fill at all. > > > > > > > > > Every way I try it becomes more complex and less correct: > > > > > > > > > > > > > > > > > > 1) `NewAllocatedObjectKind`: we do not add this to the state, > > > > > > > > > we only retrieve the original. > > > > > > > > > 2) `SimpleReturnedValueKind` and > > > > > > > > > `CXX17ElidedCopyReturnedValueKind`: depending on whether we > > > > > > > > > are in top frame we handle this case recursively or we do not > > > > > > > > > fill at all, just return the value. What is the construction > > > > > > > > > context item here? Maybe the `ReturnStmt`? > > > > > > > > > 3) `ElidedTemporaryObjectKind`: this is the most problematic: > > > > > > > > > we first handle it recursively for the construction context > > > > > > > > > after elision, then we also fill it for the elided temporary > > > > > > > > > object construction context as well. > > > > > > > > > > > > > > > > > > The only thing I can (maybe) do is to retrieve the > > > > > > > > > construction context item. But then the switch is still > > > > > > > > > duplicated for filling, because of the different control > > > > > > > > > logic for different construction context kinds. > > > > > > > > > The only thing I can (maybe) do is to retrieve the > > > > > > > > > construction context item. > > > > > > > > > > > > > > > > This does not help even for retrieving the value, because its > > > > > > > > control logic also depends on the construction context kind. If > > > > > > > > I do it, it will be code triplication instead of duplication > > > > > > > > and results in a code that is worse to understand than the > > > > > > > > current one. > > > > > > > > > > > > > > > > > > > > > > > > Please instead re-use the code that computes the object under > > > > > > > > construction. That'll save you ~50 lines of code and will be > > > > > > > > more future-proof (eg., standalone temporaries without > > > > > > > > destructor technically have a construction context with 0 items > > > > > > > > so when we implement them correctly your procedure will stop > > > > > > > > working). > > > > > > > > > > > > > > Any solution I can come up with rather adds 100 to 150 lines to > > > > > > > the code instead of saving 50. For retrieving we only have to > > > > > > > determine the construction context item (the key). For storing we > > > > > > > also have to calculate the value. It sounds good to calculate > > > > > > > these things in separate functions and then have a simple > > > > > > > retriever and store function but there are lots of recursions, > > > > > > > double strores, non-stores, retrieving in the store function that > > > > > > > make it too complicated. > > > > > > > > > > > > > > Also retrieving is more complex than just determining the item > > > > > > > and getting the value from the context: if you look at > > > > > > > `SimpleReturnedValueKind` and `CXX17ElidedCopyReturnedValueKind` > > > > > > > you can see that we do not use the construction context we got in > > > > > > > the parameter (the construction context of the return value) but > > > > > > > the construction context of the call instead. For > > > > > > > `ElidedTemporaryObjectKind` we take the construction context > > > > > > > before the elusion. > > > > > > > > > > > > > > Future proofness: I agree, this is a problem to solve. In cases > > > > > > > where the construction context is empty maybe we can move the > > > > > > > calculation of the values to a separate function that is called > > > > > > > by both the "handler" and the "retriever". > > > > > > Do I think correctly that `retrieveFromConstructionContext` (in the > > > > > > previous version) should return the same value as second part of > > > > > > `handleConstructionContext`? It does not look obvious at first. Or > > > > > > do we need a function with that purpose? If yes it looks a > > > > > > difficult task because the similar "logic" to get the value and > > > > > > update the state. Probably it is enough to add a flag to > > > > > > `handleConstructionContext` to not make new state. The current code > > > > > > looks bad because calling a "handle" type of function (that was > > > > > > private before) only to compute a value. > > > > > > > > > > > > Any solution I can come up with rather adds 100 to 150 lines to the > > > > > > code instead of saving 50. > > > > > > > > > > I think the following is a fairly literal implementation of my > > > > > suggestion: > > > > > > > > > > {F12020285} > > > > > > > > > > It's 26 lines shorter than your `retrieveFromConstructionContext()` > > > > > (ok, not 50, duplicating the switch was more annoying than i > > > > > expected), it adds zero new logic (the only new piece of logic is to > > > > > track constructors that were modeled correctly but their elision > > > > > wasn't; previously we could fit it into control flow), it no longer > > > > > tries to reverse-engineer the original behavior by duplicating its > > > > > logic, and it's more future-proof for the reasons explained above. > > > > Thank you for working on this. The point what I did not see (and I > > > > still no not see it) is that in your solution (and your proposal) > > > > instead of retrieving the value from the construction context we > > > > actually recalculate it, thus we do not use the construction context at > > > > all (except for `NewAllocatedObjectKind`) to retrieve the location of > > > > the return value. Can we guarantee that we always recalculate exactly > > > > the same symbolic value as the symbolic values we store in the map? For > > > > example, for `SimpleReturnedValueKind` et. al. we conjure a new > > > > symbolic value. Is it really the same value than that the one in the > > > > map, thus the one which is the real location where the object is > > > > constructed? If not, then the checkers might not work correctly because > > > > they store into the GDM using a different region as key than they use > > > > for retrieving the value. > > > > I think the following is a fairly literal implementation of my > > > > suggestion: ... It's 26 lines shorter than your > > > > retrieveFromConstructionContext() (ok, not 50, duplicating the switch > > > > was more annoying than i expected), it adds zero new logic (the only > > > > new piece of logic is to track constructors that were modeled correctly > > > > but their elision wasn't; previously we could fit it into control > > > > flow), it no longer tries to reverse-engineer the original behavior by > > > > duplicating its logic, and it's more future-proof for the reasons > > > > explained above. > > > > > > +1 for this version. Even though Adam has valid concerns, this version > > > contains functions with clear responsibilities. > > > > > > > Can we guarantee that we always recalculate exactly the same symbolic > > > > value as the symbolic values we store in the map? For example, for > > > > SimpleReturnedValueKind et. al. we conjure a new symbolic value. Is it > > > > really the same value than that the one in the map, thus the one which > > > > is the real location where the object is constructed? > > > > > > I think the guarantee is that in Artem's implementation the switch is > > > duplicated. That would be okay for now, but I still consider this > > > error-prone, exactly because of the duplication. So, maybe this yields > > > for a common Visitor. > > > > > > Guys, would it be possible to have a common Visitor for these? Could be > > > somewhat similar to `DeclVisitor` with return type as type parameter. And > > > we could pass a lambda that is called in every `case`? Or would that be > > > too overkill > > > > > > I think the guarantee is that in Artem's implementation the switch is > > > duplicated. > > > > That is not my point. My point is that `handleConstructionContext()` is > > called, a value is calculated and stored in the map. This calculation may > > be the creation of a new conjured symbol. Thereafter, whenever a checker > > tries to retrieve the location of the return value from the construction > > context in Artem's solution we do not retrieve the value previously stored > > in the map, but calculate a new one, which is not necessarily the same as > > the one we stored in the map, because a in some cases a new symbol is > > conjured. Then the checker uses this new symbol as the index in GDM, but > > the object is constructed to the location stored in the map. Therefore, > > later when the checker tries to retrieve data from the GDM using the > > location of the existing object as a key it cannot find the data because it > > was stored using the wrong key. > > > > `getReturnValueUnderConstruction()` must always return the location where > > the object is to be constructed. Therefore I think it must not recalculate > > the location which may involve creation of new conjured symbols but first > > it must look into the map of objects under construction. This guarantees > > that the returned location is really the location where the object is > > constructed. > Yes it should always return the same value, because there is exactly one > correct solution to the problem that it's solving. If it may return different > values for the same `(E, State, LCtx, CC)` tuple, it's a bug. This method > should ultimately be turned into a static method. > > Loading a stored value if it exists will only sweep the problem under the > rug. It'd be much better for us to //assert// that the freshly recomputed > value is equal to the stored value if the latter is present. > > The code that constructs symbolic region for the return value of a top frame > is indeed suspicious. I'm not sure it can actually return different regions > twice but this code is clearly incorrect because `SymbolConjured` is never > the correct solution anyway. The only reason `SymbolConjured` is used > anywhere is because the author (in this case, i) was too lazy to define a new > symbol class that has the right identity. `SymbolConjured` is like > `UnknownVal` that's a bit less unknown but still screams "not implemented > yet". I think this definitely deserves a fixme. > > > I think the guarantee is that in Artem's implementation the switch is > > duplicated. That would be okay for now, but I still consider this > > error-prone, exactly because of the duplication. So, maybe this yields for > > a common Visitor. > > Guys, would it be possible to have a common Visitor for these? > > Mmm, i don't think i understand this argument. A visitor //is// a switch with > a fancy interface and wildcards and lacking checks for unhandled cases. > > Generally though i wouldn't mind having a `ConstructionContextVisitor`. For > instance, that'll allow us to replace code like > ```lang=c++ > case ConstructionContext::CXX17ElidedCopyVariableKind: > case ConstructionContext::SimpleVariableKind: { > } > ``` > with a single > ```lang=c++ > VisitVariable(VariableConstructionContext *CC) { > } > ``` > > I wouldn't like a visitor for `ConstructionContextItem`s because they are > fairly meaningless when taken out of, well, //context//. A `Visitor` is a good idea. However, is it used anywhere else? If it is, then it is worth to do it, but I think it can go into a new patch. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80366/new/ https://reviews.llvm.org/D80366 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits