[clang] [llvm] [analyzer] Consolidate array bound checkers (PR #125534)

Donát Nagy via cfe-commits Tue, 04 Feb 2025 06:44:56 -0800

NagyDonat wrote:

I evaluated a sample of 10 random ArrayBoundV2 results from FFMPEG and I found 
the following:
- [Unjustified assumption of third 
iteration](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_edonnag_llvm-main_1883de3&newcheck=ffmpeg_n4.3.1_edonnag_llvm-main_f8aa6f1&diff-type=New&report-id=6936068&report-hash=4849805b289670414ad1c51975c2eeac&report-filepath=ffmpeg%2Flibavfilter%2Faf_amultiply.c)
 because my recent "[don't assume third 
iteration](https://github.com/llvm/llvm-project/pull/119388)" change doesn't 
handle the case when there is a short-circuiting operator in the loop condition.
  - IIRC this covers several (5-10??) false positives in this particular 
project, but should be significantly less common elsewhere (FFMPEG has an 
unusually high concentration of 2-element arrays).
  - Eliminating these would be a technically complex task, but has no 
fundamental obstacles. However they're rare enough that I'm not sure if they're 
worth the effort. 
- [Non-realistic 
overflow](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_edonnag_llvm-main_1883de3&newcheck=ffmpeg_n4.3.1_edonnag_llvm-main_f8aa6f1&diff-type=New&report-id=6936379&report-hash=10c43369fbbd8b9359afdebd019643ee&report-filepath=ffmpeg%2Flibavcodec%2Fansi.c)
 where the analyzer assumes that the code was able to read $\approx 2^{31}$ 
arguments from the input (which can't happen, but the analyzer cannot deduce 
this precondition of the analyzed function).
- [Correlated if 
conditions](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_edonnag_llvm-main_1883de3&newcheck=ffmpeg_n4.3.1_edonnag_llvm-main_f8aa6f1&diff-type=New&page=2&report-id=6936490&report-hash=7765e71a226bf336d525811ccb01c476&report-filepath=ffmpeg%2Flibavcodec%2Fhevc_cabac.c):
 assuming that the `av_clip` call in line 1103 returns 
`-s->ps.sps->qp_bd_offset` and then assuming that this result is `> 51` 
produces an array underflow. I suspect that this can't happen (e.g. I'd guess 
that `s->ps.sps->qp_bd_offset` is always nonnegative), but I don't think that 
Z3 crosscheck would be helpful.
- [A classical "assuming skipped loop" 
error](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_edonnag_llvm-main_1883de3&newcheck=ffmpeg_n4.3.1_edonnag_llvm-main_f8aa6f1&diff-type=New&page=2&report-id=6936516&report-hash=51d11525a3b6b456f1076aed8b790e61&report-filepath=ffmpeg%2Flibavcodec%2Fhevcpred_template.c)
 which would be silenced by [the new `assume-one-iteration` 
option](https://github.com/llvm/llvm-project/pull/125494) or a well-placed 
assertion. (Enabling that option would silence 34 of the 187 ArrayBoundV2 
errors on this project.)
- [Technically justified, but unwanted assumption of a third 
iteration](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_edonnag_llvm-main_1883de3&newcheck=ffmpeg_n4.3.1_edonnag_llvm-main_f8aa6f1&diff-type=New&page=3&report-id=6936484&report-hash=f8cf2f54adb6af9636d26f5275b959fc&report-filepath=ffmpeg%2Flibavcodec%2Fhapdec.c):
 first there is an `if` where the analyzer can assume `ctx->texture_count != 2` 
so later when the analyzer assumes that `ctx->texture_count >= 2` (i.e. it may 
enter the second iteration of the loop), it gets "enough fuel" to also enter 
the third iteration (which is an overflow error). This is a correlated 
assumption error (similar to the "correlated if" issue), so solving it is 
blocked by fundamental theoretical obstacles.
- [Yet another "assuming skipped loop" 
issue](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_edonnag_llvm-main_1883de3&newcheck=ffmpeg_n4.3.1_edonnag_llvm-main_f8aa6f1&diff-type=New&page=3&report-id=6936513&report-hash=30e1c1cefaad9086c13fdb53692a0ff4&report-filepath=ffmpeg%2Flibavcodec%2Fhevcpred_template.c),
 very similar (but somewhat more complex) than the previous one.
- [Technically a true 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=ffmpeg_n4.3.1_edonnag_llvm-main_1883de3&newcheck=ffmpeg_n4.3.1_edonnag_llvm-main_f8aa6f1&diff-type=New&page=5&report-id=6936917&report-hash=715cc85d94e2fcebde8b49ec355401f9&report-filepath=ffmpeg%2Ffftools%2Fffmpeg_opt.c)
 where the analyzer assumes that the second element of `static const char 
*opt_name_reinit_filters[] = {"reinit_filter", NULL};` may be non-`NULL` when 
the analyzed function is executed. (For a programmer it's obvious that this is 
a constant, but the analyzer cannot deduce this. The programmer could resolve 
this by adding an extra `const` qualifier.)
- The remaining three issues in my random sample are all duplicates of this one 
-- they appear as separate reports because the error is within a **macro that's 
expanded in several locations and apparently produces 80 (!) ArrayBoundV2 
reports** (which correspond to other analogous arrays instead of 
`opt_name_reinit_filters`).


-------

These reports don't look like solver issues, I don't think that Z3 refutation 
or similar improvements would help with any of them. (:thinking:  In fact, 
perhaps Z3 refutation was already enabled -- I don't know what's the default 
setting in our CI.) 

In my opinion the only actionable conclusion is that we should make report 
deduplication more aggressive around macros -- e.g. I could imagine a heuristic 
that uses the location within the macro (instead of the location of the 
particular macro expansion) for deduplication if the complexity/length of the 
macro is above a certain threshold. (Otherwise unrelated reports "coming from" 
a small macro like `MAX` or something similar shouldn't be conflated, but 
reports coming from the same part of a "big" macro should be unified.) This 
would consolidate almost half of the reports into a single report (which is 
arguably a true positive, so shouldn't be silenced). However this improvement 
is independent of ArrayBoundV2 and IMO shouldn't block this PR.

After that, `assume-one-iteration` can eliminate 34 additional reports which 
are mostly unwanted -- however that change would also cause the loss of true 
positives (on projects that are less stable and have true positives...), so 
it's likely that we don't want to enable it.

The remaining 74 reports are probably false positives that are caused by 
incompatible assumptions ("correlated if" issues), invisible preconditions (the 
programmer knows something about some argument/input of the toplevel analyzed 
function, but the analyzer cannot deduce this) or other random issues.

https://github.com/llvm/llvm-project/pull/125534
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [analyzer] Consolidate array bound checkers (PR #125534)

Reply via email to