On 18/07/2017 15:56, Laurent Vivier wrote: > On 18/07/2017 15:07, Laurent Vivier wrote: >> On 21/06/2017 15:23, Gerd Hoffmann wrote: >>> Drop the temporary workaround for the broken display updates. >>> All display adapters are updated, so this should be safe without >>> causing regressions. >> >> It seems it breaks QMP command 'migrate "exec:cat>mig"'. >> >> The command hangs and doesn't create the file. >> >> It happens with qemu-system-ppc64 on x86 (so TCG mode). >> >> my command: >> >> ./ppc64-softmmu/qemu-system-ppc64 -serial mon:stdio >> >> I wait SLOF fails to find an OS, and: >> >> Ctrl-a c >> (qemu) migrate -d "exec:cat>mig" >> >> The file is not created and the command hangs: >> >> #0 in __lll_lock_wait >> #1 in pthread_mutex_lock >> #2 in qemu_mutex_lock >> #3 in rcu_init_lock >> #4 in fork >> #5 in qemu_fork >> #6 in qio_channel_command_new_spawn >> #7 in exec_start_outgoing_migration >> #8 in qmp_migrate >> ... >> >> It looks like a deadlock. > > I think this patch is not the cause of the problem, the one it removes > just unlocks the deadlock by playing with locks. > > We have a rcu_init_lock() on fork() because of: > > utils/rcu.c: > > static void __attribute__((__constructor__)) rcu_init(void) > { > #ifdef CONFIG_POSIX > pthread_atfork(rcu_init_lock, rcu_init_unlock, rcu_init_unlock); > #endif > rcu_init_complete(); > } > > The QMP thread hangs on: > > (gdb) p rcu_sync_lock > $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 23865, > __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = { > __prev = 0x0, __next = 0x0}}, > __size = "\002\000\000\000\000\000\000\000\071]\000\000\001", '\000' > <repeats 26 times>, __align = 2}, initialized = true} > > > The lock is already taken by thread 2: > > (gdb) info thread > Id Target Id Frame > 1 Thread 0x7f1cf02fdf00 (LWP 23864) "qemu-system-ppc" > 0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0 > * 2 Thread 0x7f1cc9762700 (LWP 23865) "qemu-system-ppc" > 0x00007f1cd410daa9 in syscall () from /lib64/libc.so.6 > 3 Thread 0x7f1cbf8d5700 (LWP 23866) "qemu-system-ppc" > 0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0 > > (gdb) bt > #0 0x00007f1cd410daa9 in syscall () at /lib64/libc.so.6 > #1 0x000055ab028ddda2 in qemu_futex_wait > #2 0x000055ab028ddda2 in qemu_event_wait > #3 0x000055ab028eda2b in wait_for_readers > #4 0x000055ab028eda2b in synchronize_rcu > #5 0x000055ab028edc5b in call_rcu_thread > #6 0x00007f1cd914273a in start_thread () > #7 0x00007f1cd4113e0f in clone () > > So it seems we cannot fork() from QMP? > [cc: Paolo]
There have been other similar bugs, as David reported. The plan was to disable pthread_atfork soon after daemonize (basically assuming that after daemonize fork is immediately followed by exec), but I've been lazy and never finished those patches. Looks like it's time. Paolo