Am 01.10.2018 um 16:14 hat Kevin Wolf geschrieben: > Am 01.10.2018 um 15:03 hat Peter Maydell geschrieben: > > On 28 September 2018 at 15:36, Peter Maydell <peter.mayd...@linaro.org> > > wrote: > > > I'm finding that test-bdrv-drain hangs intermittently on my OSX host. > > > > Ping? Between this and test-replication I'm finding that my > > parallel build tests for merges are failing about 50% of the > > time :-( > > Sorry, there wasn't much more than a weekend between your report and > now. > > For the replication one, I think we can just take the AioContext lock in > the test case while we decide how the API should really be used. I'll > prepare a fix for that (and hopefully I'll be able to reproduce the > problem reliably enough to verify the fix). > > Max said he could reproduce some hang in test-bdrv-drain (though we > don't know if this has anything to do with your OS X hang, which looked > rather odd) and would look into it, but I don't think we know the > problem yet. I'll try to reproduce that one after fixing the replication > test.
So I sent two patches for the two test cases that should fix the bugs that made the tests fail relatively frequently. I can still reproduce another hang, which is a bit mysterious to me: Thread 2 (Thread 3321.3818): #0 0x00007f2ebbdcc4e9 in syscall () from /lib64/libc.so.6 #1 0x00005594d095690b in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/kwolf/source/qemu/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0x5594d0bff228 <rcu_call_ready_event>) at util/qemu-thread-posix.c:442 #3 0x00005594d0965f58 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:261 #4 0x00007f2ebc09d36d in start_thread () from /lib64/libpthread.so.0 #5 0x00007f2ebbdd1b4f in clone () from /lib64/libc.so.6 Thread 1 (Thread 3321.3321): #0 0x00007f2ebc09e89d in pthread_join () from /lib64/libpthread.so.0 #1 0x00005594d0956b6f in qemu_thread_join (thread=thread@entry=0x5594d16bd0b8) at util/qemu-thread-posix.c:565 #2 0x00005594d091f4d9 in iothread_join (iothread=0x5594d16bd0b0) at tests/iothread.c:62 #3 0x00005594d08806cc in test_iothread_common (drain_type=BDRV_DRAIN_ALL, drain_thread=<optimized out>) at tests/test-bdrv-drain.c:763 #4 0x00007f2ebd58e178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #5 0x00007f2ebd58e37b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #6 0x00007f2ebd58e37b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #7 0x00007f2ebd58e51b in g_test_run_suite () from /lib64/libglib-2.0.so.0 #8 0x00007f2ebd58e571 in g_test_run () from /lib64/libglib-2.0.so.0 #9 0x00005594d087a534 in main (argc=<optimized out>, argv=<optimized out>) at tests/test-bdrv-drain.c:1606 This pthread_join() is waiting for a thread that doesn't even exist any more. I caught the bug in rr and am clearly seeing how the iothread is notified and terminates. But pthread_join() just doesn't return. Kevin