On Mon, Dec 04, 2017 at 08:22:48PM +0100, BALATON Zoltan wrote: > I'm seeing a possible deadlock that I don't know how to debug. Any hint on > how to find the cause or what should be checked further to identify the > reason why this is happening and how to fix it is greatly appreciated. > > Here are the state of threads: > > (gdb) info thr > Id Target Id Frame > * 4 Thread 0x7fffba76c700 (LWP 3445) "qemu-system-ppc" 0x0000555555cbec04 > in worker_thread (opaque=0x7fffe40c9000) at util/thread-pool.c:92 > 3 Thread 0x7fffe8829700 (LWP 3443) "qemu-system-ppc" 0x00007ffff78d267f > in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > 2 Thread 0x7ffff111b700 (LWP 3442) "qemu-system-ppc" 0x00007ffff42cad29 > in syscall () from /lib64/libc.so.6 > 1 Thread 0x7ffff7fc7b00 (LWP 3441) "qemu-system-ppc" 0x00007ffff42c4e31 > in ppoll () from /lib64/libc.so.6 > (gdb) bt > #0 0x00007ffff78d4830 in sem_timedwait () from /lib64/libpthread.so.0 > #1 0x0000555555cc572e in qemu_sem_timedwait (sem=0x7fffe40c9078, ms=10000) > at util/qemu-thread-posix.c:289 > #2 0x0000555555cbec04 in worker_thread (opaque=0x7fffe40c9000) at > util/thread-pool.c:92 > #3 0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0 > #4 0x00007ffff42d062d in clone () from /lib64/libc.so.6 > (gdb) thr 3 > [Switching to thread 3 (Thread 0x7fffe8829700 (LWP 3443))] > #0 0x00007ffff78d267f in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > (gdb) bt > #0 0x00007ffff78d267f in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000555555cc5458 in qemu_cond_wait (cond=0x555556b47b90, > mutex=0x5555565b5220 <qemu_global_mutex>) at util/qemu-thread-posix.c:161 > #2 0x00005555557e6690 in qemu_tcg_wait_io_event (cpu=0x7ffff7e20010) at > cpus.c:1084 > #3 0x00005555557e6f00 in qemu_tcg_rr_cpu_thread_fn (arg=0x7ffff7e20010) at > cpus.c:1396 > #4 0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0 > #5 0x00007ffff42d062d in clone () from /lib64/libc.so.6 > (gdb) thr 2 > [Switching to thread 2 (Thread 0x7ffff111b700 (LWP 3442))] > #0 0x00007ffff42cad29 in syscall () from /lib64/libc.so.6 > (gdb) bt > #0 0x00007ffff42cad29 in syscall () from /lib64/libc.so.6 > #1 0x0000555555cc58a7 in qemu_futex_wait (f=0x555556a01134 > <rcu_call_ready_event>, val=4294967295) at include/qemu/futex.h:29 > #2 0x0000555555cc5a74 in qemu_event_wait (ev=0x555556a01134 > <rcu_call_ready_event>) at util/qemu-thread-posix.c:442 > #3 0x0000555555cdd92c in call_rcu_thread (opaque=0x0) at util/rcu.c:249 > #4 0x00007ffff78cd5bd in start_thread () from /lib64/libpthread.so.0 > #5 0x00007ffff42d062d in clone () from /lib64/libc.so.6 > (gdb) thr 1 > [Switching to thread 1 (Thread 0x7ffff7fc7b00 (LWP 3441))] > #0 0x00007ffff42c4e31 in ppoll () from /lib64/libc.so.6 > (gdb) bt > #0 0x00007ffff42c4e31 in ppoll () from /lib64/libc.so.6 > #1 0x0000555555cbfe86 in qemu_poll_ns (fds=0x555557c17620, nfds=5, > timeout=29806320) at util/qemu-timer.c:334 > #2 0x0000555555cc0eab in os_host_main_loop_wait (timeout=29806320) at > util/main-loop.c:255 > #3 0x0000555555cc0f7d in main_loop_wait (nonblocking=0) at > util/main-loop.c:515 > #4 0x000055555599e2b3 in main_loop () at vl.c:1995 > #5 0x00005555559a6353 in main (argc=21, argv=0x7fffffffdef8, > envp=0x7fffffffdfa8) at vl.c:4911 > > Then if I wait a little, thread 4 exits due to sem_timedwait returning -1 > with errno=ETIMEDOUT leaving other threads waiting for something to happen > but this is apparently a deadlock as it will be stuck here (thread 1-3 are > still as above). Any idea why this could happen and how to debug it furhter?
Are you using the latest qemu.git/master? Commit ef6dada8b44e1e7c4bec5c1115903af9af415b50 ("util/async: use atomic_mb_set in qemu_bh_cancel") fixes hangs that occur with the thread pool (Thread 4 in your example). I'm not sure if this applies to your hang though... It looks like Thread 3 isn't running guest code because the cpu wants to sleep (is it halted?). Stefan
signature.asc
Description: PGP signature