Hi Emilio, I've arrived at your patch, noted in the subject, with bisection (please see the bisection log attached).
I'm on RHEL-7.1. Sometimes I have to work with upstream QEMU, and then I use it with my preexistent libvirt guests, pulling QEMU somewhat infrequently. My libvirt-related version numbers are: libvirtd: 1.2.8-16.el7_1.3.x86_64 libvirt-python: 1.2.8-7.el7_1.1.x86_64 libvirt-g*: 0.1.7-3.el7.x86_64 virt-manager: 1.1.0-12.el7.noarch The symptom is that when your patch is built into QEMU, then QEMU starts, but hangs as soon as I click the specific VM's entry in virt-manager's list. In the process list ("ps"), I can then see two qemu processes, parent and child. I saved backtraces for both of them, while they were hung. The command lines are also visible in the attached text files. The line numbers (ie. the QEMU binary) matches the tree when checked out and built at exactly your patch. (I double checked: if I build at 5243722376^, then it works.) The configure command was: ./configure \ --audio-drv-list=alsa \ --target-list=x86_64-softmmu,i386-softmmu,aarch64-softmmu \ --disable-vde \ --enable-werror \ --enable-spice \ --disable-stack-protector \ --prefix=/opt/qemu-installed \ --disable-gtk \ --enable-debug \ --enable-trace-backends=stderr I don't think libvirt, or for that matter, any QMP interfaces, have anything to do with this. I rather believe that libvirt invokes QEMU for retrieving the capabilities in a way that exposes a possible problem in your patch. (Hence I provided my libvirt version numbers just to be sure.) ... In fact I'm confused about your patch. rcu_init() makes sure that at fork(), the parent will first acquire both "rcu_sync_lock" and "rcu_registry_lock". Meaning, no other thread in the parent can hold those mutexen when the parent thread calling fork() actually forks. Then, in the parent, the original thread simply releases both mutexen, in rcu_init_unlock(). In the child, only the one thread exists that called fork() in the parent. However, that one child thread does own the copies of both mutexen. So it is prudent for the child to release both copies. Your patch causes "rcu_registry_lock" to be reinitialized in the child, rather than released, plus "rcu_sync_lock" remains untouched (ie. locked by the one thread that exists in the child). Why is that correct? (Side note: we're talking process-private, not process-shared mutexen.) I can be easily wrong, but I don't understand the commit message, and why the patch is correct. ... Hm, I can see the discussion here: http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360421 Okay... let me see 24fa90499f... "The problem is that releasing error-checking locks in the child fails under glibc with EPERM". <-- That is a striking surprise to me, but still, the removal of PTHREAD_MUTEX_ERRORCHECK only justifies why your patch would *not* be necessary. The last paragraph of your email that I linked above talks about a "possibility of corruption". Maybe I've managed to trigger that. If so, I hope it won't be hard to fix up. ... Hm, apparently Alex had mentioned the same concern as I did now, about ignoring "rcu_sync_lock" in the child, in message <http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360602>. Was that concern cleared up eventually? Thanks! Laszlo
git bisect start # bad: [619622424dba749feef752d76d79ef2569f7f250] Merge remote-tracking branch 'remotes/berrange/tags/vnc-crypto-v9-for-upstream' into staging git bisect bad 619622424dba749feef752d76d79ef2569f7f250 # good: [2b750d9d261bda7f75b39dfc1e1e5f22502929d5] Merge remote-tracking branch 'remotes/aurel/tags/pull-sh4-next-20150913' into staging git bisect good 2b750d9d261bda7f75b39dfc1e1e5f22502929d5 # bad: [a2aa09e18186801931763fbd40a751fa39971b18] Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging git bisect bad a2aa09e18186801931763fbd40a751fa39971b18 # bad: [0c71d41e2aa3c7356500ae624166f3bb8c201aee] scripts/dump-guest-memory.py: fix after RAMBlock change git bisect bad 0c71d41e2aa3c7356500ae624166f3bb8c201aee # good: [3c9589e180d98cdadb143bd2a792fb9d19d9aec6] Move RAMBlock and ram_list to ram_addr.h git bisect good 3c9589e180d98cdadb143bd2a792fb9d19d9aec6 # bad: [3904e6bf042391abc749d717465022e96e276fc7] cutils: Add qemu_strtoull() wrapper git bisect bad 3904e6bf042391abc749d717465022e96e276fc7 # bad: [709037636992e9289ce9147e59d56fb35d90b140] linux-user: call rcu_(un)register_thread on pthread_(exit|create) git bisect bad 709037636992e9289ce9147e59d56fb35d90b140 # bad: [5243722376873a48e9852a58b91f4d4101ee66e4] rcu: init rcu_registry_lock after fork git bisect bad 5243722376873a48e9852a58b91f4d4101ee66e4 # good: [12a1ddc160cb6a73e8a6c319f3962a20da2cd22f] Makefile.target: include top level build dir in vpath git bisect good 12a1ddc160cb6a73e8a6c319f3962a20da2cd22f # first bad commit: [5243722376873a48e9852a58b91f4d4101ee66e4] rcu: init rcu_registry_lock after fork
UID PID PPID C STIME TTY TIME CMD qemu 17305 1752 0 13:24 ? 00:00:00 /opt/qemu-installed/bin/qemu-system-i386 -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize (gdb) thread apply all bt full Thread 2 (Thread 0x7fa9c3db7700 (LWP 17306)): #0 0x00007fa9c7dda949 in syscall () from /lib64/libc.so.6 No symbol table info available. #1 0x00007fa9cebc0f73 in futex_wait (ev=0x7fa9cf5245a4 <rcu_call_ready_event>, val=4294967295) at util/qemu-thread-posix.c:301 No locals. #2 0x00007fa9cebc106a in qemu_event_wait (ev=0x7fa9cf5245a4 <rcu_call_ready_event>) at util/qemu-thread-posix.c:408 value = 1 #3 0x00007fa9cebd4666 in call_rcu_thread (opaque=0x0) at util/rcu.c:254 tries = 0 n = 0 node = 0x7fa9ce712990 #4 0x00007fa9cd2fedf5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #5 0x00007fa9c7de01ad in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (Thread 0x7fa9ce6f2bc0 (LWP 17305)): #0 0x00007fa9cd30525d in read () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa9ce915c7c in os_daemonize () at os-posix.c:223 status = 0 '\000' len = 140733260032912 pid = 17307 fds = {4, 5} #2 0x00007fa9ce92a803 in main (argc=12, argv=0x7fff03f8efd8, envp=0x7fff03f8f040) at vl.c:4034 i = 0 snapshot = 0 linux_boot = 0 initrd_filename = 0x7fa9d0749eb0 "îkÅΩ\177" kernel_filename = 0x7fa9d0749ea0 "" kernel_cmdline = 0x7fa9cebd4e20 <__libc_csu_init> "AWA\211ÿAVI\211öAUI\211ÕATL\215%" boot_order = 0x0 boot_once = 0x0 ds = 0x7fa9cec56d38 cyls = 0 heads = 0 secs = 0 translation = 0 hda_opts = 0x0 opts = 0x7fa9d0790e90 machine_opts = 0xfffffffe7fffffff icount_opts = 0x0 olist = 0x7fa9cf03b140 <qemu_machine_opts> optind = 12 optarg = 0x7fa9d0790f40 "none" loadvm = 0x0 machine_class = 0x7fa9d077a160 cpu_model = 0x0 vga_model = 0x0 qtest_chrdev = 0x0 qtest_log = 0x0 pid_file = 0x7fff03f8ff59 "/var/lib/libvirt/qemu/capabilities.pidfile" incoming = 0x0 show_vnc_port = 0 defconfig = true userconfig = false log_mask = 0x0 log_file = 0x0 mem_trace = {malloc = 0x7fa9ce9276a2 <malloc_and_trace>, realloc = 0x7fa9ce9276d7 <realloc_and_trace>, free = 0x7fa9ce92771b <free_and_trace>, calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0} trace_events = 0x0 trace_file = 0x0 maxram_size = 134217728 ram_slots = 0 vmstate_dump_file = 0x0 main_loop_err = 0x0 err = 0x0 __func__ = "main"
UID PID PPID C STIME TTY TIME CMD qemu 17307 17305 0 13:24 ? 00:00:00 /opt/qemu-installed/bin/qemu-system-i386 -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize (gdb) thread apply all bt full Thread 1 (Thread 0x7fa9ce6f2bc0 (LWP 17307)): #0 0x00007fa9cd304f7d in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fa9cd300d32 in _L_lock_791 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00007fa9cd300c38 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fa9cebc0ad1 in qemu_mutex_lock (mutex=0x7fa9cf524560 <rcu_sync_lock>) at util/qemu-thread-posix.c:73 err = 0 __func__ = "qemu_mutex_lock" #4 0x00007fa9cebd491a in rcu_init_lock () at util/rcu.c:329 No locals. #5 0x00007fa9c7da7512 in fork () from /lib64/libc.so.6 No symbol table info available. #6 0x00007fa9ce915cef in os_daemonize () at os-posix.c:240 pid = 0 fds = {4, 5} #7 0x00007fa9ce92a803 in main (argc=12, argv=0x7fff03f8efd8, envp=0x7fff03f8f040) at vl.c:4034 i = 0 snapshot = 0 linux_boot = 0 initrd_filename = 0x7fa9d0749eb0 "îkÅΩ\177" kernel_filename = 0x7fa9d0749ea0 "" kernel_cmdline = 0x7fa9cebd4e20 <__libc_csu_init> "AWA\211ÿAVI\211öAUI\211ÕATL\215%" boot_order = 0x0 boot_once = 0x0 ds = 0x7fa9cec56d38 cyls = 0 heads = 0 secs = 0 translation = 0 hda_opts = 0x0 opts = 0x7fa9d0790e90 machine_opts = 0xfffffffe7fffffff icount_opts = 0x0 olist = 0x7fa9cf03b140 <qemu_machine_opts> optind = 12 optarg = 0x7fa9d0790f40 "none" loadvm = 0x0 machine_class = 0x7fa9d077a160 cpu_model = 0x0 vga_model = 0x0 qtest_chrdev = 0x0 qtest_log = 0x0 pid_file = 0x7fff03f8ff59 "/var/lib/libvirt/qemu/capabilities.pidfile" incoming = 0x0 show_vnc_port = 0 defconfig = true userconfig = false log_mask = 0x0 log_file = 0x0 mem_trace = {malloc = 0x7fa9ce9276a2 <malloc_and_trace>, realloc = 0x7fa9ce9276d7 <realloc_and_trace>, free = 0x7fa9ce92771b <free_and_trace>, calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0} trace_events = 0x0 trace_file = 0x0 maxram_size = 134217728 ram_slots = 0 vmstate_dump_file = 0x0 main_loop_err = 0x0 err = 0x0 __func__ = "main"