[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Valeriy Savchenko via Phabricator via cfe-commits Fri, 04 Jun 2021 03:51:35 -0700

vsavchenko added a comment.

In D103605#2798171 <https://reviews.llvm.org/D103605#2798171>, @NoQ wrote:


> Ok. Oof. Whoa. That's amazing.
>
> Let me re-iterate to make sure that my understanding of this patchset is 
> correct.
>
> **Chapter 1.** //"A is correlated with B (ρ = 0.56), given C, assuming D and 
> under E conditions"//
>
> - Two new entities are introduced: trackers and handlers.

Correct.

> - Trackers are attached to bug reports. A single bug report may have multiple 
> trackers.
>   - Why not have only one tracker per report?

That's a good question, we can actually even make it a part of `BugReport` for 
that matter.  I can though think of one exotic case, where you wait till you 
encounter certain weird condition, where you need to spawn a very specific 
customized tracker only starting from that point.

> - Handlers are attached to trackers. Each tracker can have multiple handlers 
> with a priority system.
>   - Do you envision it growing into a dependency system later?

Yes, I think in order to correctly split some parts of the logic that already 
exists in `trackExpressionValue`

> - A subclass of visitors known as "tracking visitors" are connected to 
> trackers. One tracker may have multiple visitors attached to it.

Correct.

> - Trackers are a tiny, almost uncustomizable class. It contains is a method 
> called `track()` that is invoked when attention of handlers is required.

Correct.  If you really want to, you can change the default behavior of those 
track functions.

> - Each tracker comes with a default set of handlers.

Correct.  This is how we migrate existing functionality into the new interface.

> - The only way to customize a tracker is to add extra handlers to it.

It's not the only one, you can subclass it and add completely different logic.  
But the main way for the majority of use cases is indeed through handlers.

> - Handlers implement a `handle()` method invoked by the tracker every time 
> their attention is required in `track()`.

Correct.

> - Handlers may return a note from their `handle()` method. If a handler 
> returns a note, this note is attached to the bug report and lower-priority 
> handlers are not invoked at all.

Not entirely correct, this is correct for store handlers, these are intended to 
customize messages for events of symbolic value being stored into the region.
Expression handlers return `Tracker::Result` that simply tells if some visitor 
was added or not (backward compatibility with `trackExpressionValue`) and if we 
should prevent all other handlers from running here.

> - There are two major classes of handlers: expression handlers and store 
> handlers. Expression handlers pay attention to Environment entries, store 
> handlers pay attention to Region Store bindings.

I would say that expression handler is a way to externally extend 
`trackExpressionValue` functionality.  Add new supported type of expression 
that we can track or add checker-specific visitor recursively.

> **Chapter 2.** //"For immediate release: Scientists find potential link 
> between A and B"//
>
> - Bug visitors are still responsible for discovering connections across 
> exploded nodes. On the contrary, `track()`/`handle()` workflow is 
> instantaneous.

It is tricky because `track` can add visitors and they can call it back from 
`VisitNode`, and so on to infinity.  This is a pattern that we had between 
`trackExpressionValue` and `FindLastBRVisitor` (and `ReturnVisitor`) and every 
checker can create their own recursive tracking.

> - It sounds like this whole patchset is orthogonal to reimplementing value 
> tracking visitors with note tags, as long as we can pass a reference to the 
> appropriate tracker into the tag(?)

Correct, it just makes existing back-slicing tracking easier to customize.

> - Tracking starts when a checker creates a tracker and invokes `track()` on 
> it, which instantly causes the respective `handle()` to be invoked.
> - `handle()` may attach visitors. These visitors may in turn call `track()` 
> as they reach a different exploded node.

Exactly.

> - The termination condition is as usual: no further visitors attached.
>   - It sounds like this patchset is orthogonal to the problem of detecting 
> the tracking termination point(?) It does make it easier to implement 
> customized reactions to such termination though.

Actually, tracker dies when this happens.  But hey, can you provide a use case 
when the checker really needs to know when this happens?

> **Chapter 3.** //"A causes B all the time. What will this means for Obama?"//
>
> - `trackExpressionValue()` now boils down to creation of a single tracker and 
> an initial invocation of `track()` on the expression which in turn causes 
> expression handlers to immediately react.

Correct.

> - Most of the old body of `trackExpressionValue()` is transformed into a 
> default expression handler. So it's still part of an immediate reaction, just 
> a bit more generalized.

Correct.

> - Most of the visitors that previously composed `trackExpressionValue()` are 
> still part of the process; they're attached by the default expression handler.

Correction: "**All** of the visitors..."

> - However, instead of adding notes directly, they simply invoke `track()`. 
> All note-attaching work is moved into additional default handlers.

No, they call `track` where they used to call `trackExpressionValue` or add 
`FindLastStoreBRVisitor`.
Right now, only stores received their special customization and can produce 
notes based on tracker customization.

> **Chapter 4.** //"I'm wearing this to ward off A!"//
>
> - Basic usage in checkers is basically unchanged. They still simply call 
> `trackExpressionValue()` and it just works.

Correct

> - Advanced usage - which is the whole point of this patchset - probably 
> implies adding custom handlers to the tracker created by 
> `trackExpressionValue()`.

Correct

> - Let's consider an example. Suppose you're implementing `evalCall` for 
> `llvm::dyn_cast`. You could have your expression handler detect a modeled 
> `dyn_cast` call and implement its `handle()` method to emit a 
> checker-specific note and start tracking the argument.

So, in here it would look like this: If I want to support `dyn_cast` for 
tracking, I will add extra `ExpressionHandler` that calls `track` on the 
expression inside of `llvm::dyn_cast`.  Additionally, if you want to change the 
note's message for situations like `a = dyn_cast<Ty>(b)`, you can add a 
`StoreHandler` that will produce a note like "a is casted from b".

> - With this implementation such note will only be emitted if the value is 
> tracked back to `dyn_cast`. Not every modeled `dyn_cast` will receive a note 
> but only the one that, say, caused a null dereference on its return value 
> when it failed.

Correct.

> - Now, how does this interact with note tags? The idea behind note tags is so 
> that visitors didn't need to reverse-engineer the analysis.

I mean, it's just an orthogonal mechanism.

> - It sounds like the handler is a step back to the visitor world: it doesn't 
> know how exactly was `dyn_cast` modeled during analysis, so it has to 
> reverse-engineer what happened.

Correct, but during the analysis, checker could've modeled a zillion of 
different `dyn_casts`, and we don't want to have notes on all of them.

> - For example, in order to find out whether `dyn_cast` succeeded or failed, 
> it has to explore the program state to find out which values are passed into 
> the cast and returned from the cast.
> - Whereas a note tag would have this information accessible directly.

Maybe we can access the note tag from the handler and kill two birds with one 
stone?

> - Hence a suggestion. Maybe instead of having a system of handlers, allow 
> visitors to //ask// the tracker whether `track()` was invoked at this node on 
> a certain expression or region?

There are still situations when you don't have tags, but want to hook into the 
tracking process, so you can add your custom feature.  Let's say you modeled a 
system call `foo(a, b)` and you want to track `a` and `b`.  Or more generally 
if you want to help tracking when it's stuck.
Additionally, handlers help to tear existing `trackExpressionValue` 
implementation apart.

> - I.e., polling instead of callbacks.
> - This would naturally integrate with note tags.

My counter-suggestion is to create a flag on `NoteTag` (maybe `Prunable` 
already can fit into that, I'm not very familiar with this interface) that 
means "this tag should appear only when part of the tracking trace", and make a 
handler-visitor pair that will produce notes with messages in those tags.
IMO, it will prevent a ton of checkers repeating very similar code: make 
visitor that will check if some expression is tracked and generate a note out 
of tag

> - This does not solve the priority problem. We could try to solve it 
> separately by, say, enforcing at most one note per exploded node, and then 
> implementing a similar priority system over visitors.
> - Or am I missing something about your solution that makes connection to note 
> tags a non-issue?

My understanding is that tags don't rule out tracking.  The other way around is 
also true.  My take on this - let's make tracking a bit more flexible, it's not 
gonna hurt anyone.
Another aspect that I think is very important here is the ability to make this 
change incrementally and to extract some parts of `trackExpressionValue` and 
`FindLastStoreBRVisitor` back into checkers to make it at least a bit easier 
for the reader.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103605/new/

https://reviews.llvm.org/D103605

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Reply via email to