[PATCH] D121797: [clang][dataflow] Add modeling of Chromium's CHECK functionality

Gábor Horváth via Phabricator via cfe-commits Fri, 18 Mar 2022 09:46:43 -0700

xazax.hun added inline comments.

================
Comment at: 
clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp:122
+  void transfer(const Stmt *S, NoopLattice &, Environment &Env) {
+    M.transfer(S, Env);
+  }
----------------
ymandel wrote:
> xazax.hun wrote:
> > ymandel wrote:
> > > xazax.hun wrote:
> > > > I wonder whether the models should actually be called by the framework 
> > > > at some point. 
> > > > E.g. imagine the following scenario:
> > > > ```
> > > > void f()
> > > > {
> > > >     std::optional<int> o(5);
> > > >     if (o)
> > > >     {
> > > >         // dead code here;
> > > >     }
> > > > }
> > > > ```
> > > > 
> > > > In an ideal case, an analysis could use the `std::optional` modeling to 
> > > > realize that the code in the `if` statement is dead and use this fact 
> > > > to improve its precision. Explicitly request the modeling in the 
> > > > transfer function works OK when we only have a couple things to model. 
> > > > But it might not scale in the future. When we model dozens of standard 
> > > > types and functions we would not want all the analysis clients to 
> > > > invoke all the transfers for all the models individually. 
> > > Agreed. It seems similar the problems that motivated DLLs back in the 
> > > day. there's clearly a lot to be worked out here in terms of how best to 
> > > support composition. It's probably worth a RFC or somesuch to discuss in 
> > > more depth.
> > Having an RFC and some deeper discussions would be great. I also wonder 
> > whether modeling arbitrary `Stmt`s is the right approach. The peculiarities 
> > of the language should probably be modelled by the framework itself without 
> > any extensions. Maybe we only want the modeling of certain function calls 
> > to be customizable?
> Good question. It would be really nice if we could draw this line, but I have 
> a bad feeling that it won't be so simple. :) Still, worth looking at our 
> existing models, and new ones that we're developing, to see if we can find a 
> clear "bounding box".
> 
> What does CSA do in this regard?
In the CSA, there is no clear distinction between modeling and diagnostics, 
each check can do both. Historically, this turned out to be a bad idea. When we 
have a check that is doing both modeling and diagnostics, users can end up 
turning the check off due to some false positives and end up getting worse 
results from other checks because some critical piece of modeling is also 
turned off. (E.g., even if there are a couple of false positives from a 
std::optional check, it might still be beneficial to do the modeling in the 
background without the diagnostics because it might help discover infeasible 
paths and that can improve the precision of other checks). 

Nowadays, we try to make the distinction clear, some checks are modeling only 
and others are diagnostic only (they might still have their own modeling but 
they do not affect the global analysis state, i.e., the diagnostic only checks 
should not be able to affect each other). This distinction is currently a best 
effort approach and not enforced by any of the APIs.

In CSA, the checker APIs are very powerful. E.g., if there is a pointer with an 
unknown value and we see a dereference, we can continue the analysis with the 
assumption that the pointer is not-null. These assumptions could be added by 
the framework itself or just a regular check. Over time, we are trying to move 
as much modeling to (non-diagnostic) checks as possible to keep the framework 
lightweight but most of the meat is still in the framework.

To model libraries, we are using the evalCall callback: 
https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp#L229

Roughly speaking the model looks like this:
* The analyzer encounters a function call, so it asks all the checks in a 
sequence if any of them wants to model it
* A check gets the evalCall callback and can do whatever it wants to do. Most 
of them will return false most of the time as they are only expected to handle 
a small subset of the functions.
* This first check returning true will short-circuit this process.
* If none of the checks returned true, the framework will fall back to a 
default modeling which is a conservative approximation of the call, i.e., 
invalidating the bindings that could be changes by the function (globals, 
output arguments, return values etc.)

The model above assumes that when a check return true it will end up modeling 
all the aspects of a function (like invalidation). A downside might be that it 
will not solve the composition, when multiple checks want to model the same 
function, well, the framework will just pick one of them. Also, modeling types 
is really challenging. E.g., if a modeling check models the constructor of a 
type but does not model the destructor, the framework will end up using the 
default modeling for the dtor. The problem is that, in that case the ctor was 
not modeled by the framework (but the modeling checker) so some invariants were 
not established. This can result in false positives or even crashes. Overall, 
it looks like modeling types is almost an all or nothing endeavor.  If a check 
models at least one method of a type it is really likely that it will need to 
model at least a bunch more to ensure a good experience.

At Microsoft, we have a similar framework to CSA called EspXEngine. In 
EspXEngine, we have different APIs for modeling and diagnostic so the API 
enforce that diagnostic checks cannot influence each other. Our model for the 
modeling checks is very similar to CSA but the idea is that the modeling checks 
are maintained by the authors of EspXEngine (while diagnostic checks could be 
written by anyone), so conflicting models are less of a problem.

Both EspXEngine and CSA has problems chaining modeling checks. E.g., if CSA has 
modeling for `unique_ptr` and `optional`, `optional<unique_ptr<int>>` or 
`unique_ptr<optional<int>>` will not model every aspects of the inner type out 
of the box. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121797/new/

https://reviews.llvm.org/D121797

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121797: [clang][dataflow] Add modeling of Chromium's CHECK functionality

Reply via email to