Re: [RFC 0/8] Introduce an extensible static analyzer

Daniel P . Berrangé Tue, 05 Jul 2022 09:12:33 -0700

On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berra...@redhat.com> wrote:
> >      for i in `git ls-tree --name-only -r HEAD:`
> >      do
> >         clang-tidy $i 1>/dev/null 2>&1
> >      done
> 
> All of those invocations are probably failing quickly due to missing
> includes and other problems, since the location of the compilation
> database and some other arguments haven't been specified.


Opps yes, I was waaaay too minimalist in testing that.

> 
> Accounting for those problems (and enabling just one random C++ check):
> 
>     $ time clang-tidy -p build \
>         --extra-arg-before=-Wno-unknown-warning-option \
>         --extra-arg='-isystem [...]' \
>         --checks='-*,clang-analyzer-cplusplus.Move' \
>         $( find block -name '*.c' )
>     [...]
> 
>     real    3m0.260s
>     user    2m58.041s
>     sys     0m1.467s

Only analysing the block tree, but if we consider a static analysis
framework is desirable to use for whole of qemu.git, lets see the
numbers for everything.

What follows was done on  my P1 Gen2 thinkpad with 6 core / 12 threads,
where I use 'make -j 12' normally.

First as a benchmark, lets see a full compile of whole of QEMU (with
GCC) on Fedora 36 x86_64

    => 14 minutes


I find this waaaaay too slow though, so I typically configure QEMU with
--target-list=x86_64-softmmu since that suffices 90% of the time.

   => 2 minutes


A 'make check' on this x86_64-only build takes another 2 minutes.


Now, a static analysis baseline across the whole tree with default
tests enabled

 $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep '\.c$')

  => 45 minutes

wow, wasn't expecting it to be that slow !

Lets restrict to just the block/ dir

 $ clang-tidy --quiet -p build $(find block -name '*.c')

  => 4 minutes

And further restrict to just 1 simple check

 $ clang-tidy --quiet   --checks='-*,clang-analyzer-cplusplus.Move'  -p build 
$(find block -name '*.c')
  => 2 minutes 30


So extrapolated just that single check would probably work out
at 30 mins for the whole tree.

Overall this isn't cheap, and in the same order of magnitude
as a full compile. I guess this shouldn't be that surprising
really.



> Single-threaded static-analyzer.py without any checks:
> 
>     $ time ./static-analyzer.py build block -j 1
>     Analyzed 79 translation units in 16.0 seconds.
> 
>     real    0m16.665s
>     user    0m15.967s
>     sys     0m0.604s
> 
> And with just the 'return-value-never-used' check enabled for a
> somewhat fairer comparison:
> 
>     $ time ./static-analyzer.py build block -j 1 \
>         -c return-value-never-used
>     Analyzed 79 translation units in 61.5 seconds.
> 
>     real    1m2.080s
>     user    1m1.372s
>     sys     0m0.513s
> 
> Which is good news!

On my machine, a whole tree analysis allowing parallel execution
(iiuc no -j arg means use all cores):

  ./static-analyzer.py build  $(git ls-tree -r --name-only HEAD: | grep '\.c$'

   => 13 minutes

Or just the block layer

  ./static-analyzer.py build  $(find block -name '*.c')

   => 45 seconds


So your script is faster than clang-tidy, which suggests python probably
isn't the major dominating factor in speed, at least at this point in
time.


Still, a full tree analysis time of 13 minutes, compared to  my normal
'make' time of 2 minutes is an order of magnitude.


One thing that I noticed when doing this is that we can only really
do static analysis of files that we can actually compile on the host.
IOW, if on a Linux host, we don't get static analysis of code that
is targetted at FreeBSD / Windows platforms. Obvious really, since
libclang has to do a full parse and this will lack header files
for those platforms. That's just the tradeoff you have to accept
when using a compiler for static analysis, vs a tool that does
"dumb" string based regex matching to detect mistakes. Both kinds
of tools likely have their place for different tasks.


Overall I think a libclang based analysis tool will be useful, but
I can't see us enabling it as a standard part of 'make check'
given the time penalty.


Feels like something that'll have to be opt-in to a large degree
for regular contributors. In terms of gating CI though, it is less
of an issue, since we massively parallelize jobs. As long as we
have a dedicated build job just for running this static analysis
check in isolation, and NOT as 'make check' in all existing jobs,
it can happen in parallel with all the other build jobs, and we
won't notice the speed.

In summary, I think this approach is viable despite the speed
penalty provided we dont wire it into 'make check' by default.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC 0/8] Introduce an extensible static analyzer

Reply via email to