[PATCH] D52984: [analyzer] Checker reviewer's checklist

Artem Dergachev via Phabricator via cfe-commits Fri, 19 Oct 2018 17:20:57 -0700

NoQ added a comment.

My take on the out-of-alpha checklist:


- The checker should be evaluated on a large codebase in order to discover 
crashes and false positives specific to the checker. It should demonstrate the 
expected reasonably good results on it, with the stress on having as little 
false positives as possible.
  - The codebase should contain patterns that the checker checks. It should 
ideally also contain actual bugs, but it is fine if the checker was inspired by 
a one-of-a-kind devastating bug while there aren't many other instances of this 
bug around.

- Warning and note messages should be clear and easy to understand, even if a 
bit long.
  - Messages should start with a capital letter (unlike Clang warnings!) and 
should not end with `.`.
  - Articles are usually omitted, eg. `Dereference of a null pointer` -> 
`Dereference of null pointer`.

- Static Analyzer's checker API saves you from a lot of work, but it still 
forces you into a certain amount of boilerplate code when it comes to managing 
entities specific to your checker:
  - Almost all path-sensitive checkers need their own `BugReporterVisitor` in 
order to highlight interesting events along the path - or at least a call to 
`trackNullOrUndefValue`. //(need to wait for https://reviews.llvm.org/D52758)//
    - For example, SimpleStreamChecker should highlight the event of opening 
the file when reporting a file descriptor leak.
  - If the checker tracks anything in the program state, it needs to implement 
the `checkDeadSymbols` callback to clean the state up.

- Unless there's a strong indication that the user is willing to opt-in to a 
stricter checking, obvious false positives that could be caused by the checker 
itself should be fixed. For example:
  - If a checker tracks state of mutable objects enumerated by symbols, 
escaping symbols into unfamiliar functions must invalidate the state - for 
example, through `checkPointerEscape`.
    - For example, SimpleStreamChecker should drop its knowledge about an open 
file descriptor when it is passed into a function that may potentially close 
the descriptor. At the same time, it is fine to not escape closed descriptors.
  - If the checker is intentionally leaving a room for false positives, i.e., 
it checks for a coding standard that isn't critical to follow when it comes to 
formal correctness of the program, then there should be a cheap explicit way to 
suppress the warnings when the violation of the coding standard is intentional.
  - Fuzzy matching of API function or type names is generally dangerous when it 
may lead to false positives, fine otherwise.
    - For example, a checker for LLVM `isa<>` API shouldn't match it as `isa*`, 
so that it didn't accidentally make assumptions on how `isalpha` works.
    - It is perfectly fine to have some fuzziness for initial experiments in 
alpha - that's a great way to figure out which APIs you want to cover.

My take on the everyday checklist:

- Checkers are encouraged to actively participate in the analysis by sharing 
its knowledge about the program state with the rest of the analyzer, but they 
should not be disrupting the analysis unnecessarily:
  - If a checker splits program state, this must be based on knowledge that the 
newly appearing branches are definitely possible and worth exploring from the 
user's perspective. Otherwise the state split should be delayed until there's 
an indication that one of the paths is taken, or one of the paths needs to be 
dropped entirely.
    - For example, it is fine to eagerly split paths while modeling 
`isalpha(x)` as long as `x` is constrained accordingly on each path. At the 
same time, it is not a good idea to split paths over the return value of 
`printf()` while modeling the call because nobody ever checks for errors in 
`printf`; at best, it'd just double the remaining analysis time.
    - Caution is advised when using `CheckerContext::generateNonFatalErrorNode` 
because it generates an independent transition, much like `addTransition`. It 
is easy to accidentally split paths while using it.
      - Ideally, try to structure the code so that it was obvious that every 
`addTransition` or `generateNonFatalErrorNode` (or sequence of such if the 
split is intended) is immediately followed by return from the checker callback.
  - Multiple implementations of `evalCall` in different checkers should not 
conflict.
  - When implementing `evalAssume`, the checker should always return a non-null 
state for either the true assumption or the false assumption (or both).
  - Checkers shall not mutate values of expressions, i.e. use the 
`ProgramState::BindExpr` API, unless they are fully responsible for computing 
the value. Under no circumstances should they change non-`Unknown` values of 
expressions.
    - Currently the only valid use case for this API in checkers is to model 
the return value in the `evalCall` callback.
    - If expression values are incorrect, `ExprEngine` needs to be fixed 
instead.



- The following checker code patterns are not wrong but very suspicious:
  - In `State->getSVal(Region)`, `Region` is not necessarily a 
`TypedValueRegion` and the optional type argument is not specified.
    - It is likely that the checker may accidentally try to dereference a void 
pointer.
  - Checker logic depends on whether a certain value is a `Loc` or `NonLoc`.
    - It should be immediately obvious whether the `SVal` is a `Loc` or a 
`NonLoc` depending on the AST that is being checked.
    - Checking whether a value is `Loc` or `Unknown`/`Undefined` or whether the 
value is `NonLoc` or `Unknown`/`Undefined` is totally fine.
  - New symbols are constructed in the checker via direct calls to 
`SymbolManager`, unless they are of `SymbolMetadata` class tagged by the 
checker, or they represent newly created values such as the return value in 
`evalCall`.
    - For modeling arithmetic/bitwise/comparison operations, `SValBuilder` 
should be used.
  - Custom `ProgramPointTag`s are created within the checker.
    - There is usually no good reason for a checker to chain multiple nodes 
together, because checkers aren't worklists.
  - A `BugReporterVisitor` does something different from `if 
(N->getState()->get<Trait>(Key) != 
N->getFirstPred()->getState()->get<Trait>(Key)) emitSomeNote();`.
    - That's the easiest and safest idiom to highlight the changes in the 
program state.
    - It is not necessary to re-hardcode all sorts of statements on which the 
checker reacts into the visitor.

- Use safe and convenient APIs!
  - Always use `CheckerContext::generateErrorNode` and 
`CheckerContext::generateNonFatalErrorNode` for emitting bug reports. Most 
importantly, never emit report against `CheckerContext::getPredecessor`.
  - Prefer `checkPreCall` and `checkPostCall` to `checkPreStmt<CallExpr>` and 
`checkPostStmt<CallExpr>`.
  - Use `CallDescription` to detect hardcoded API calls in the program.
  - Simplify `C.getState()->getSVal(E, C.getLocationContext())` to 
`C.getSVal(E)`.

- Common sources of crashes:
  - `CallEvent::getOriginExpr` is nullable - for example, it returns null for 
an automatic destructor of a variable.
    - The same applies to all values generated while the call was modeled, eg. 
`SymbolConjured::getStmt` is nullable.
  - `CallEvent::getDecl` is nullable - for example, it returns null for a call 
of symbolic function pointer.
  - `addTransition`, `generateSink`, `generateNonFatalErrorNode`, 
`generateErrorNode` are nullable because you can transition to a node that you 
have already visited.
  - Methods of `CallExpr`/`FunctionDecl`/`CallEvent` that return arguments 
crash when the argument is out-of-bounds.
    - If you checked the function name, it doesn't mean that the function has 
the expected number of arguments!
      - Which is why you should use `CallDescription`.
  - Nullability of different entities within different kinds of symbols and 
regions is usually documented via assertions in their constructors.

In https://reviews.llvm.org/D52984#1258724, @MTC wrote:

> - More general coding standards.


We have https://llvm.org/docs/CodingStandards.html but yeah, having 
Analyzer-specific coding standards is cool. I guess i listed a few items above.



================
Comment at: www/analyzer/checker_dev_manual.html:710
+<ul>
+<li>Is there unintended branching in the exploded graph?</li>
+<li>Are there unintended sinks?</li>
----------------
xazax.hun wrote:
> NoQ wrote:
> > Most importantly, "If addTransition() with unspecified predecessor is ever 
> > executed after generateNonFatalErrorNode(), is the state split intended?"
> Is generateNonFatalErrorNode special? Two addTransition calls could have 
> similar unintended effect, though I can imagine that being more rare. 
`generateNonFatalErrorNode()` is special because it's too counter-intuitive and 
hard to remember that it is in fact equivalent to `addTransition()`, i.e. it 
isn't "the same thing as `generateErrorNode()` just sounds less scary" :)

This becomes really weird when there are multiple checks done in callees of the 
checker callback, only some of which are non-fatal, followed by 
`addTransition()` at the end of the callback. In this case this problem makes 
the code a lot more complex: you need to pass the current node around through 
all callees.


https://reviews.llvm.org/D52984



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D52984: [analyzer] Checker reviewer's checklist

Reply via email to