[clang] [analyzer][Solver] Early return if sym is concrete on assuming (PR #115579)

Ding Fei via cfe-commits Tue, 12 Nov 2024 19:40:17 -0800

danix800 wrote:

> First, let me thank you for posting a high quality patch like this.
> 
> `ProgramState::isPosteriorlyOverconstrained()` was introduced to carry the 
> fact that even the parent state was infeasible. Checkers were written in the 
> sense that `ProgramState::assume()` will return at least one valid State. 
> This was true for a really long time, until we improved the Solver with some 
> simplification triggered by perfectly constraining symbols. This meant that 
> at that point we could have situations where the Solver disproves both the 
> true and the false states, thus force us to violate the contract of 
> `ProgramState::assume` by returning two null states. This caused crashes. To 
> illustrate the issue consider this:
> 
> ```
> // Let's say x could be either 0 or 1 here.
> if (/*some fancy infeasible constraint dependent on x */) {
>   if (x == 0) { } else { }
> }
> ```
> 
> Let's pretend that at the first `if` we barely couldn't prove that there is 
> no value `x` that would satisfy the constraint. At the `if (x == 0)` we would 
> split the path 2 ways, where x is 0 and on the other one is 1. The Solver 
> would try to simplify the constraints and realize that the State where x is 1 
> has contradictions; and then later also realize that the other State is also 
> infeasible. It's only possible if even the parent State was infeasible.
> 
> So, we have to return a "valid" state just to discard everything the checker 
> would do. This wasted work is the lesser evil. Basically, this is the story 
> of `isPosteriorlyOverconstrained()`, and the why behind it.
>


Thanks for these helpful details!

> So, the fact that `assume()` never returns two null states is a blissful lie 
> to ease checker writing.
> 
> Now, why is this issue interesting? It's because the checkers can still 
> observe the side effects of this machinery I shared when they chain 
> ExplodedNodes returned by `addTransition()`. This breaks the illusion, 
> leading to crashes as you shared.
> 
> Let's say we have a checker that adds some assumptions, and the `assume` call 
> returned a single valid state. It's would be surprising that if I add a 
> transition to it, suddenly it sinks. Even though that's technically the right 
> thing to do, the checker did nothing that would imply sinking the path. This 
> doesn't mean that the defensive `isSink` call in the StdLibFnChecker is 
> unjustified. I'm okay with that. However, this would imply that the rest of 
> the API uses of `addTransition` where we chain the ExplodedNodes are also 
> left vulnerable to this issue. There are probably a lot less checkers doing 
> this sort of chaining, so checking each callsite may be a valid option. 
> However, this would raise a bar for using this API for newcomers, which I 
> don't really like.

I think there's no clear boundaries between checkers and the engine. Checkers 
can decide whether the engine should
stop, a state should be splitted or transitioned to new one, or do nothing at 
all.

So contracts are important, I think we should clarify these contracts and make 
more if needed, but not compromise to
some not-well-written checkers. This also means that checker should be aware of 
the existence of posteriorly-overconstrained states, especially when they try 
to chain those states.

> 
> Consequently, I think this patch would mask the issue. We should rather 
> prevent breaking the illusion by somehow allowing subsequent `addTransitions` 
> even if `isPosteriorlyOverconstrained()` is true.

How about: whenever a checker tries to chain states by itself, it should be 
more careful on node validation. Engine
responds with a failed assertion in this case, which drives us to spot the 
issue and fix it. If `isPosteriorlyOverconstrained()` 
nodes are allowed then we might lose this chance to fix issues of both the 
checker and the solver.

`isSink()` testing is suffice for this case, but might not be clear enough 
generally. I was suprised by this precedingly sink node too.

As an anology, when checker try to transition to an **unchanged** state, engine 
will return the predecessor
as a prevention. If checker needs to chain states, it's a waste for the checker 
to keep working on this state without
testing that the returned node is the predecessor (nothing is changed).

We might change the engine to return null node intentionaly if it's marked as 
sink by `isPosteriorlyOverconstrained()`.
It's more clear. But the checker still needs to test it. Whenever we generates 
error node for bug reporting, we test if the
new node is actually generated:

```c
      if (ExplodedNode *N = C.generateErrorNode(NewState, NewNode))
        reportBug(Call, N, Constraint.get(), Summary, C);
```

This is another similar contract between checkers and the engine. In this 
sense, how about testing like this in this case:
```c
     if (!NewNode || NewNode->isSink())
        break;
```

> However, this would imply that the rest of the API uses of `addTransition` 
> where we chain the ExplodedNodes are also left vulnerable to this issue. 
> There are probably a lot less checkers doing this sort of chaining, so 
> checking each callsite may be a valid option. 

Maybe case-by-case fixing is enough.

https://github.com/llvm/llvm-project/pull/115579
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer][Solver] Early return if sym is concrete on assuming (PR #115579)

Reply via email to