On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote: > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berra...@redhat.com> wrote: > > for i in `git ls-tree --name-only -r HEAD:` > > do > > clang-tidy $i 1>/dev/null 2>&1 > > done > > All of those invocations are probably failing quickly due to missing > includes and other problems, since the location of the compilation > database and some other arguments haven't been specified.
Opps yes, I was waaaay too minimalist in testing that. > > Accounting for those problems (and enabling just one random C++ check): > > $ time clang-tidy -p build \ > --extra-arg-before=-Wno-unknown-warning-option \ > --extra-arg='-isystem [...]' \ > --checks='-*,clang-analyzer-cplusplus.Move' \ > $( find block -name '*.c' ) > [...] > > real 3m0.260s > user 2m58.041s > sys 0m1.467s Only analysing the block tree, but if we consider a static analysis framework is desirable to use for whole of qemu.git, lets see the numbers for everything. What follows was done on my P1 Gen2 thinkpad with 6 core / 12 threads, where I use 'make -j 12' normally. First as a benchmark, lets see a full compile of whole of QEMU (with GCC) on Fedora 36 x86_64 => 14 minutes I find this waaaaay too slow though, so I typically configure QEMU with --target-list=x86_64-softmmu since that suffices 90% of the time. => 2 minutes A 'make check' on this x86_64-only build takes another 2 minutes. Now, a static analysis baseline across the whole tree with default tests enabled $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep '\.c$') => 45 minutes wow, wasn't expecting it to be that slow ! Lets restrict to just the block/ dir $ clang-tidy --quiet -p build $(find block -name '*.c') => 4 minutes And further restrict to just 1 simple check $ clang-tidy --quiet --checks='-*,clang-analyzer-cplusplus.Move' -p build $(find block -name '*.c') => 2 minutes 30 So extrapolated just that single check would probably work out at 30 mins for the whole tree. Overall this isn't cheap, and in the same order of magnitude as a full compile. I guess this shouldn't be that surprising really. > Single-threaded static-analyzer.py without any checks: > > $ time ./static-analyzer.py build block -j 1 > Analyzed 79 translation units in 16.0 seconds. > > real 0m16.665s > user 0m15.967s > sys 0m0.604s > > And with just the 'return-value-never-used' check enabled for a > somewhat fairer comparison: > > $ time ./static-analyzer.py build block -j 1 \ > -c return-value-never-used > Analyzed 79 translation units in 61.5 seconds. > > real 1m2.080s > user 1m1.372s > sys 0m0.513s > > Which is good news! On my machine, a whole tree analysis allowing parallel execution (iiuc no -j arg means use all cores): ./static-analyzer.py build $(git ls-tree -r --name-only HEAD: | grep '\.c$' => 13 minutes Or just the block layer ./static-analyzer.py build $(find block -name '*.c') => 45 seconds So your script is faster than clang-tidy, which suggests python probably isn't the major dominating factor in speed, at least at this point in time. Still, a full tree analysis time of 13 minutes, compared to my normal 'make' time of 2 minutes is an order of magnitude. One thing that I noticed when doing this is that we can only really do static analysis of files that we can actually compile on the host. IOW, if on a Linux host, we don't get static analysis of code that is targetted at FreeBSD / Windows platforms. Obvious really, since libclang has to do a full parse and this will lack header files for those platforms. That's just the tradeoff you have to accept when using a compiler for static analysis, vs a tool that does "dumb" string based regex matching to detect mistakes. Both kinds of tools likely have their place for different tasks. Overall I think a libclang based analysis tool will be useful, but I can't see us enabling it as a standard part of 'make check' given the time penalty. Feels like something that'll have to be opt-in to a large degree for regular contributors. In terms of gating CI though, it is less of an issue, since we massively parallelize jobs. As long as we have a dedicated build job just for running this static analysis check in isolation, and NOT as 'make check' in all existing jobs, it can happen in parallel with all the other build jobs, and we won't notice the speed. In summary, I think this approach is viable despite the speed penalty provided we dont wire it into 'make check' by default. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|