[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Donát Nagy via cfe-commits Wed, 16 Oct 2024 09:10:19 -0700

NagyDonat wrote:

> > * I think it's likely (although not guaranteed) that this heuristic would 
> > be helpful for other checkers as well, and if we want to activate it for 
> > all checkers, then it must be done in an eager way (because eagerly sinking 
> > lots of paths is significantly better performance-wise than lazily 
> > discarding all those results).
> 
> Alright. So, if I understand you is not only to fix the OOBv2 checker but 
> also to later expand the scope to the whole engine - and skip exploring paths 
> that were explored before. I think that deserves a deeper discussion, but I 
> also understand that we need implementation experience for experimentation 
> back the RFC with data. This is wasn't clear to me initially. The PR title 
> and summary refereed to suppression, which didn't motivate this change well 
> enough to touch the engine. Last time I've checked the Loop hanging RFC 
> [discuss 
> post](https://discourse.llvm.org/t/loop-handling-improvement-plans/80417/15), 
> I though you finally decided to take @haoNoQ's suggestion - and suppress 
> issues instead of changing exploration strategies (if I understood that 
> correctly).


Currently, my only goal is that I want to stabilize `ArrayBoundV2` ASAP and 
then continue with cleaning up the other bounds-checking checkers. However, I 
think that generalizing this (currently checker-specific) suppression to 
outright skipping those execution paths is a plausible direction of future 
development and I think it would be nice to implement the current goals in a 
way that supports this direction.

(As I said earlier, I'm strongly opposed to a solution where the visitor-based 
suppression is activated by each checker [or many checkers] because then we'd 
be discarding a significant percentage (perhaps even the majority) of work done 
by the analyzer.)

Also when I discussed this patch with @Szelethus he also told that he vaguely 
recalls from old test result reviews that these "weak loop assumptions" also 
cause some false positives in certain stable core checkers, which implies that 
we should implement this suppression in a way that can be generalized for other 
checkers.

> That said, if this patch only want's to suppress reports, then a bug report 
> visitor is the right choice to me. If we shoot for something bigger (and this 
> is just the first step), than it starts to make sense. However, achieving 
> something like that is going to be really really tough, so the chances are 
> slim.

I don't think that discarding the execution paths with these "weak" assumptions 
would be a really big change:
- It only discards paths, so it cannot introduce any false positives (unlike 
the loop widening plans, which performs invalidation).
- I don't think that there is a systemic class of bugs that are only found on 
these "weak assumption execution path" branches _and_ can be distinguished 
algorithmically from the highly unwanted results.

> I wonder if you have plans for doing it incrementally [...]

My roadmap roughly looks like:
- clean up this PR and merge this when it's good enough to be useful;
- refine this heuristic with follow-up commits that provide more accurate 
iteration counts and cover loop conditions that contain short-circuiting 
operators;
- evaluate a change that replaces the ArrayBoundV2-specific result suppression 
with completely discarding the execution paths with "weak" assumptions
- merge this if it turns out to be an improvement.
(I didn't think too much about this yet.)

>  [...] and with options for disabling it while developing and preparing this 
> new version.

Obviously I can put the freshly developed stuff behind analyzer options if 
there's demand for it. In the case of this `ArrayBoundV2` change I did not do 
so, because `ArrayBoundV2` is an alpha checker and it's very noisy without this 
patch, but if I extend this to stable checkers I'll probably introduce an 
option.

> [RE `OrigCursor` trickery] 1st, if we always had a ExplodedNode tag for 
> taking an **eager** decision (and some similar tag when eager assumptions are 
> disabled), then I wouldn't need that hack.

I think "did we make a new assumption here?" is canonically answered by "does 
this node have two (or more) children?" and this information shouldn't be 
duplicated elsewhere.

Obviously, it's fine to have a tag which e.g. marks that "here we make an 
_eager_ assumption" (or more precisely those tags currently means that "_if we 
made an assumption here, then_ it's an eager assumption" because they're added 
even if there isn't actual bifurcation) -- but we should not aim to record all 
assumptions as tags.

>  [RE `OrigCursor` trickery] 2nd, It's a convenient lie to have a stripped 
> single-path "exploded graph" from a BugReport visitor, because then it's 
> unambiguous what parent nodes it chooses, thus developers can focus on what 
> matters the most. However, like in this case, it would be nice if we could do 
> arbitrary things. If we had a better API doing this, it wouldn't feel odd 
> anymore. So there are ways to improve this.

I think the natural way to represent an execution path would be just array of 
`ExplodedNode`s and the checkers could just use its (completely ordinary) 
iterators to follow the bug path. This mutilation of the exploded graph seems 
to be a really stupid idea (which probably means that I don't see some 
important goal  e.g. -- perhaps they want to release the memory used to store 
other nodes?).

> Yes, I agree that this "try all subexpressions" (that I implemented) isn't 
> bulletproof. However, I only did it in slightly more than a day. I'd say, if 
> this is a concern, I can give it a bit of a polishing, and put it to the 
> bench to see how it works in life.

I don't think that this is a significant concern, your visitor seems to be good 
enough for practical use. I only highlighted this part to show that the two 
implementations are roughly equal in this area (they are both good enough in 
practice, but both of them have inelegant corner cases).

> > @steakhal @ anybody else Is my eager implementation acceptable for you (in 
> > theory, assuming that it's cleaned up sufficiently)? If yes, I'll start to 
> > work on implementing the suggestions of @isuckatcs and any other 
> > observations.
> 
> I don't think we reached consensus yet.

I see. What can I do to approach consensus? Is there some aspect of either of 
the implementations that we should discuss in more detail? Or should we ask 
other contributors?

> I'm less concerned about the details, because I know you can do it. This is 
> why I didn't look into the details of this PR, or commented nits so far.

Understandable, I'm also delaying detail-oriented work (on both alternative 
implementations) until we decide on the way forward.

https://github.com/llvm/llvm-project/pull/109804
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Reply via email to