[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Artem Dergachev via Phabricator via cfe-commits Fri, 04 Jun 2021 13:10:45 -0700

NoQ added a comment.

In D103605#2798581 <https://reviews.llvm.org/D103605#2798581>, @vsavchenko 
wrote:

> In D103605#2798171 <https://reviews.llvm.org/D103605#2798171>, @NoQ wrote:
>
>> - Trackers are attached to bug reports. A single bug report may have 
>> multiple trackers.
>>   - Why not have only one tracker per report?
>
> That's a good question, we can actually even make it a part of `BugReport` 
> for that matter.  I can though think of one exotic case, where you wait till 
> you encounter certain weird condition, where you need to spawn a very 
> specific customized tracker only starting from that point.

Ok, let's take a simpler example:

  int foo(int D) {
    int A = 1;
    int B = 1;
    int C = B - A;
    return D / C; // division by zero
  }

And suppose we want to track `A` and `B` so that to explain why we think that 
`C` is zero. What are pros and cons of:
(1) tracking `A` and `B` within the same tracker that already helped us reach 
`C`?
(2) terminating the current tracker as it reaches `C` and spawning two separate 
trackers for `A` and `B`?

>> - The termination condition is as usual: no further visitors attached.
>>   - It sounds like this patchset is orthogonal to the problem of detecting 
>> the tracking termination point(?) It does make it easier to implement 
>> customized reactions to such termination though.
>
> Actually, tracker dies when this happens.  But hey, can you provide a use 
> case when the checker really needs to know when this happens?

Inlined defensive check suppressions are a major example. Basically null 
dereference warnings are suppressed based on properties of the program point at 
which the tracker terminates. If we know when this happens, we can refactor 
inlined defensive check suppressions into a checker specific handler.

>> - However, instead of adding notes directly, they simply invoke `track()`. 
>> All note-attaching work is moved into additional default handlers.
>
> No, they call `track` where they used to call `trackExpressionValue` or add 
> `FindLastStoreBRVisitor`.
> Right now, only stores received their special customization and can produce 
> notes based on tracker customization.

What's the ultimate plan though, do we want to fully move all notes into 
handlers?

>> - Let's consider an example. Suppose you're implementing `evalCall` for 
>> `llvm::dyn_cast`. You could have your expression handler detect a modeled 
>> `dyn_cast` call and implement its `handle()` method to emit a 
>> checker-specific note and start tracking the argument.
>
> So, in here it would look like this: If I want to support `dyn_cast` for 
> tracking, I will add extra `ExpressionHandler` that calls `track` on the 
> expression inside of `llvm::dyn_cast`.  Additionally, if you want to change 
> the note's message for situations like `a = dyn_cast<Ty>(b)`, you can add a 
> `StoreHandler` that will produce a note like "a is casted from b".
>
>> - With this implementation such note will only be emitted if the value is 
>> tracked back to `dyn_cast`. Not every modeled `dyn_cast` will receive a note 
>> but only the one that, say, caused a null dereference on its return value 
>> when it failed.
>
> Correct.
>
>> - Now, how does this interact with note tags? The idea behind note tags is 
>> so that visitors didn't need to reverse-engineer the analysis.
>
> I mean, it's just an orthogonal mechanism.
>
>> - It sounds like the handler is a step back to the visitor world: it doesn't 
>> know how exactly was `dyn_cast` modeled during analysis, so it has to 
>> reverse-engineer what happened.
>
> Correct, but during the analysis, checker could've modeled a zillion of 
> different `dyn_casts`, and we don't want to have notes on all of them.
>
>> - For example, in order to find out whether `dyn_cast` succeeded or failed, 
>> it has to explore the program state to find out which values are passed into 
>> the cast and returned from the cast.
>> - Whereas a note tag would have this information accessible directly.
>
> Maybe we can access the note tag from the handler and kill two birds with one 
> stone?

If we can, one way or another, I believe we absolutely should kill two birds 
with one stone. We shouldn't have two mutually exclusive solutions for two 
aspects of the same problem. In order to produce good notes, it's necessary to 
*both* know what happened during analysis *and* know whether the value is 
tracked.

>> - Hence a suggestion. Maybe instead of having a system of handlers, allow 
>> visitors to //ask// the tracker whether `track()` was invoked at this node 
>> on a certain expression or region?
>
> There are still situations when you don't have tags, but want to hook into 
> the tracking process, so you can add your custom feature.  Let's say you 
> modeled a system call `foo(a, b)` and you want to track `a` and `b`.  Or more 
> generally if you want to help tracking when it's stuck.

Yes, so like in my `a = dyn_cast<...>(b)` example the proposed solution would 
be like:

  if (isTracked(a)) {
    emitNote();
    track(b);
  }

Such code can be put into either a visitor or a note tag depending on what you 
have. It is entirely equivalent to the current prevalent idiom

  if (isInteresting(a)) {
    emitNote();
    track(b);
  }

except it's asking the right question, with your facility allowing to answer 
it. This approach is very easy to understand for checker developers, basically 
adds no extra code (doesn't require people to inherit classes).

> Additionally, handlers help to tear existing `trackExpressionValue` 
> implementation apart.
>
>> - I.e., polling instead of callbacks.
>> - This would naturally integrate with note tags.
>
> My counter-suggestion is to create a flag on `NoteTag` (maybe `Prunable` 
> already can fit into that, I'm not very familiar with this interface) that 
> means "this tag should appear only when part of the tracking trace", and make 
> a handler-visitor pair that will produce notes with messages in those tags.

I think this works too. I like the idea that like note tags are for //execution 
paths//, we could make "track tags" for //value paths// (or, well, "value 
tracks"? - we should really use this concept more often, it's very effective; 
this patchset separates these two notions of a path that we were previously 
confusing).

It sounds a bit more restrictive on what kinds of conditions are allowed. For 
example, if we have `auto [a, b] = foo(c)`, we may decide to track `c` either 
when both `a` and `b` are tracked, or when only one of them is tracked; I think 
both approaches could be reasonable.

In particular, a simple flag (a-la `Prunable`) is not enough. There needs to be 
a way to communicate *which* value in the state is of interest. The data 
structure describing such conditions functionally may potentially grow pretty 
large and this may be a reason to go imperative but I don't think we'll 
actually run into this problem soon.

> IMO, it will prevent a ton of checkers repeating very similar code: make 
> visitor that will check if some expression is tracked and generate a note out 
> of tag

Yes, absolutely, any solution must eliminate the need for such visitors.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103605/new/

https://reviews.llvm.org/D103605

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D103605: [analyzer] Introduce a new interface for tracking

Reply via email to