On Sat, 23 May 2020 at 17:36, Emilio G. Cota <c...@braap.org> wrote: > > On Fri, May 22, 2020 at 12:07:36 -0400, Robert Foley wrote: > > This patch series continues the work done by Emilio Cota and others to add > > Thread Sanitizer (TSan) support to QEMU. > > > > The starting point for this work was Emilio's branch here: > > https://github.com/cota/qemu/commits/tsan > > specifically this commit: 0be125fc0afd47218b34d2019abdd19b644f3199 > > > > The purpose of this patch is not to fix all the TSan warnings, but to enable > > the TSan support so that QEMU developers can start using the tool. > > We found this tool useful and even ran it on our recent changes in > > the cpu-locks series. > > Clearly there is work to do here to clean up all the warnings. :) > > We have made a start to cleaning up these warnings by getting a VM to boot > > cleanly with no TSan warnings. > > We have also made an effort to introduce enough of the TSan suppression > > mechanisms, so that others can continue this work. > > Thank you for doing this work! Great to see this finally coming along. > > What are your plans wrt the per-cpu-lock branch? Given that some of > the races reported here are fixed there, I think it would make sense to > defer those patches to the per-cpu-lock series (i.e. patches 2/19, parts > of 13/19, and 18/19) so that this series goes in first (it is a lot > less likely to break anything).
I agree with you that we should defer my patches which overlap with the per-cpu-lock patch series. > > Also, I would not worry too much about rushing to bring warnings to > 0; what's important is that with the warnings we now know where to > focus on, and then we can carefully fix races. In particular I think > all annotations we add must have a comment, since otherwise we are > at the risk of blindlessly silencing warnings of real races. In order to re-focus the series a bit, we are planning to drop the annotations from the series. This allows for a bit more focus on enabling TSan and less on bringing the warnings to 0 as you mentioned. At the same time, we're also going to add in a bit more details to testing.rst on how to use the various suppression mechanisms, annotations, blacklist and suppressions.txt. > > > Issues: > > - When running docker-test-quick under TSan there are several tests which > > hang > > - The unit tests which seem to hang under TSan: > > test-char, test-qdev-global-props, and test-qga. > > - If we comment out those tests, check-unit finishes, albeit with > > a couple of warnings. :) > > I've noticed another issue (that I did not notice on my previous > machine), which is that tsan blows up when in qht we lock all > of the bucket's locks. We then get this assert from tsan, since > it has a static limit of 64 mutexes held at any given time: > > FATAL: ThreadSanitizer CHECK failed: > /build/llvm-toolchain-10-yegZYJ/llvm-toolchain-10-10.0.0/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 > "((n_all_locks_)) < > (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" > (0x40, 0x40) <snip> > A workaround for now is to run qemu with TSAN_OPTIONS=detect_deadlocks=0, > although selectively disabling tsan for qht_map_lock/unlock_buckets would > be ideal (not sure if it's possible). We have been using detect_deadlocks=0 to avoid this. We will see if it is possible to disable tsan for just qht_map_lock/unlock_buckets > > Another warning I'm reliably seen is: > WARNING: ThreadSanitizer: double lock of a mutex (pid=489006) > #0 pthread_mutex_lock <null> (qemu-system-aarch64+0x457596) > #1 qemu_mutex_lock_impl > /home/cota/src/qemu/util/qemu-thread-posix.c:79:11 > (qemu-system-aarch64+0xaf7e3c) > > Location is heap block of size 328 at 0x7b4800114900 allocated by main > thread: > #0 calloc <null> (qemu-system-aarch64+0x438b80) > #1 g_malloc0 <null> (libglib-2.0.so.0+0x57d30) > > Mutex M57 (0x7b4800114960) created at: > #0 pthread_mutex_init <null> (qemu-system-aarch64+0x43b74d) > #1 qemu_rec_mutex_init > /home/cota/src/qemu/util/qemu-thread-posix.c:119:11 > (qemu-system-aarch64+0xaf815b) > > But this one seems safe to ignore. > This one we disabled in the suppressions.txt file. TSan reports a double lock even when the mutex is flagged as recursive. Thanks & Regards, -Rob > Thanks, > E.