Hi Emilio,

I've arrived at your patch, noted in the subject, with bisection (please
see the bisection log attached).

I'm on RHEL-7.1. Sometimes I have to work with upstream QEMU, and then I
use it with my preexistent libvirt guests, pulling QEMU somewhat
infrequently. My libvirt-related version numbers are:

libvirtd: 1.2.8-16.el7_1.3.x86_64
libvirt-python: 1.2.8-7.el7_1.1.x86_64
libvirt-g*: 0.1.7-3.el7.x86_64
virt-manager: 1.1.0-12.el7.noarch

The symptom is that when your patch is built into QEMU, then QEMU
starts, but hangs as soon as I click the specific VM's entry in
virt-manager's list.

In the process list ("ps"), I can then see two qemu processes, parent
and child. I saved backtraces for both of them, while they were hung.
The command lines are also visible in the attached text files. The line
numbers (ie. the QEMU binary) matches the tree when checked out and
built at exactly your patch.

(I double checked: if I build at 5243722376^, then it works.)

The configure command was:

./configure \
  --audio-drv-list=alsa \
  --target-list=x86_64-softmmu,i386-softmmu,aarch64-softmmu \
  --disable-vde \
  --enable-werror \
  --enable-spice \
  --disable-stack-protector \
  --prefix=/opt/qemu-installed \
  --disable-gtk \
  --enable-debug \
  --enable-trace-backends=stderr

I don't think libvirt, or for that matter, any QMP interfaces, have
anything to do with this. I rather believe that libvirt invokes QEMU for
retrieving the capabilities in a way that exposes a possible problem in
your patch. (Hence I provided my libvirt version numbers just to be sure.)

... In fact I'm confused about your patch. rcu_init() makes sure that at
fork(), the parent will first acquire both "rcu_sync_lock" and
"rcu_registry_lock". Meaning, no other thread in the parent can hold
those mutexen when the parent thread calling fork() actually forks.

Then, in the parent, the original thread simply releases both mutexen,
in rcu_init_unlock(). In the child, only the one thread exists that
called fork() in the parent. However, that one child thread does own the
copies of both mutexen. So it is prudent for the child to release both
copies.

Your patch causes "rcu_registry_lock" to be reinitialized in the child,
rather than released, plus "rcu_sync_lock" remains untouched (ie. locked
by the one thread that exists in the child). Why is that correct?

(Side note: we're talking process-private, not process-shared mutexen.)

I can be easily wrong, but I don't understand the commit message, and
why the patch is correct.

... Hm, I can see the discussion here:

http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360421

Okay... let me see 24fa90499f... "The problem is that releasing
error-checking locks in the child fails under glibc with EPERM". <--
That is a striking surprise to me, but still, the removal of
PTHREAD_MUTEX_ERRORCHECK only justifies why your patch would *not* be
necessary.

The last paragraph of your email that I linked above talks about a
"possibility of corruption". Maybe I've managed to trigger that. If so,
I hope it won't be hard to fix up.

... Hm, apparently Alex had mentioned the same concern as I did now,
about ignoring "rcu_sync_lock" in the child, in message
<http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360602>.
Was that concern cleared up eventually?

Thanks!
Laszlo
git bisect start
# bad: [619622424dba749feef752d76d79ef2569f7f250] Merge remote-tracking branch 'remotes/berrange/tags/vnc-crypto-v9-for-upstream' into staging
git bisect bad 619622424dba749feef752d76d79ef2569f7f250
# good: [2b750d9d261bda7f75b39dfc1e1e5f22502929d5] Merge remote-tracking branch 'remotes/aurel/tags/pull-sh4-next-20150913' into staging
git bisect good 2b750d9d261bda7f75b39dfc1e1e5f22502929d5
# bad: [a2aa09e18186801931763fbd40a751fa39971b18] Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
git bisect bad a2aa09e18186801931763fbd40a751fa39971b18
# bad: [0c71d41e2aa3c7356500ae624166f3bb8c201aee] scripts/dump-guest-memory.py: fix after RAMBlock change
git bisect bad 0c71d41e2aa3c7356500ae624166f3bb8c201aee
# good: [3c9589e180d98cdadb143bd2a792fb9d19d9aec6] Move RAMBlock and ram_list to ram_addr.h
git bisect good 3c9589e180d98cdadb143bd2a792fb9d19d9aec6
# bad: [3904e6bf042391abc749d717465022e96e276fc7] cutils: Add qemu_strtoull() wrapper
git bisect bad 3904e6bf042391abc749d717465022e96e276fc7
# bad: [709037636992e9289ce9147e59d56fb35d90b140] linux-user: call rcu_(un)register_thread on pthread_(exit|create)
git bisect bad 709037636992e9289ce9147e59d56fb35d90b140
# bad: [5243722376873a48e9852a58b91f4d4101ee66e4] rcu: init rcu_registry_lock after fork
git bisect bad 5243722376873a48e9852a58b91f4d4101ee66e4
# good: [12a1ddc160cb6a73e8a6c319f3962a20da2cd22f] Makefile.target: include top level build dir in vpath
git bisect good 12a1ddc160cb6a73e8a6c319f3962a20da2cd22f
# first bad commit: [5243722376873a48e9852a58b91f4d4101ee66e4] rcu: init rcu_registry_lock after fork
UID        PID  PPID  C STIME TTY          TIME CMD
qemu     17305  1752  0 13:24 ?        00:00:00 
/opt/qemu-installed/bin/qemu-system-i386 -S -no-user-config -nodefaults 
-nographic -M none -qmp 
unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile 
/var/lib/libvirt/qemu/capabilities.pidfile -daemonize

(gdb) thread apply all bt full

Thread 2 (Thread 0x7fa9c3db7700 (LWP 17306)):
#0  0x00007fa9c7dda949 in syscall () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fa9cebc0f73 in futex_wait (ev=0x7fa9cf5245a4 <rcu_call_ready_event>, 
val=4294967295) at util/qemu-thread-posix.c:301
No locals.
#2  0x00007fa9cebc106a in qemu_event_wait (ev=0x7fa9cf5245a4 
<rcu_call_ready_event>) at util/qemu-thread-posix.c:408
        value = 1
#3  0x00007fa9cebd4666 in call_rcu_thread (opaque=0x0) at util/rcu.c:254
        tries = 0
        n = 0
        node = 0x7fa9ce712990
#4  0x00007fa9cd2fedf5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5  0x00007fa9c7de01ad in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x7fa9ce6f2bc0 (LWP 17305)):
#0  0x00007fa9cd30525d in read () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fa9ce915c7c in os_daemonize () at os-posix.c:223
        status = 0 '\000'
        len = 140733260032912
        pid = 17307
        fds = {4, 5}
#2  0x00007fa9ce92a803 in main (argc=12, argv=0x7fff03f8efd8, 
envp=0x7fff03f8f040) at vl.c:4034
        i = 0
        snapshot = 0
        linux_boot = 0
        initrd_filename = 0x7fa9d0749eb0 "îkÅΩ\177"
        kernel_filename = 0x7fa9d0749ea0 ""
        kernel_cmdline = 0x7fa9cebd4e20 <__libc_csu_init> 
"AWA\211ÿAVI\211öAUI\211ÕATL\215%"
        boot_order = 0x0
        boot_once = 0x0
        ds = 0x7fa9cec56d38
        cyls = 0
        heads = 0
        secs = 0
        translation = 0
        hda_opts = 0x0
        opts = 0x7fa9d0790e90
        machine_opts = 0xfffffffe7fffffff
        icount_opts = 0x0
        olist = 0x7fa9cf03b140 <qemu_machine_opts>
        optind = 12
        optarg = 0x7fa9d0790f40 "none"
        loadvm = 0x0
        machine_class = 0x7fa9d077a160
        cpu_model = 0x0
        vga_model = 0x0
        qtest_chrdev = 0x0
        qtest_log = 0x0
        pid_file = 0x7fff03f8ff59 "/var/lib/libvirt/qemu/capabilities.pidfile"
        incoming = 0x0
        show_vnc_port = 0
        defconfig = true
        userconfig = false
        log_mask = 0x0
        log_file = 0x0
        mem_trace = {malloc = 0x7fa9ce9276a2 <malloc_and_trace>, realloc = 
0x7fa9ce9276d7 <realloc_and_trace>, free = 0x7fa9ce92771b <free_and_trace>, 
calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0}
        trace_events = 0x0
        trace_file = 0x0
        maxram_size = 134217728
        ram_slots = 0
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        __func__ = "main"

UID        PID  PPID  C STIME TTY          TIME CMD
qemu     17307 17305  0 13:24 ?        00:00:00 
/opt/qemu-installed/bin/qemu-system-i386 -S -no-user-config -nodefaults 
-nographic -M none -qmp 
unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile 
/var/lib/libvirt/qemu/capabilities.pidfile -daemonize

(gdb) thread apply all bt full

Thread 1 (Thread 0x7fa9ce6f2bc0 (LWP 17307)):
#0  0x00007fa9cd304f7d in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fa9cd300d32 in _L_lock_791 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00007fa9cd300c38 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fa9cebc0ad1 in qemu_mutex_lock (mutex=0x7fa9cf524560 
<rcu_sync_lock>) at util/qemu-thread-posix.c:73
        err = 0
        __func__ = "qemu_mutex_lock"
#4  0x00007fa9cebd491a in rcu_init_lock () at util/rcu.c:329
No locals.
#5  0x00007fa9c7da7512 in fork () from /lib64/libc.so.6
No symbol table info available.
#6  0x00007fa9ce915cef in os_daemonize () at os-posix.c:240
        pid = 0
        fds = {4, 5}
#7  0x00007fa9ce92a803 in main (argc=12, argv=0x7fff03f8efd8, 
envp=0x7fff03f8f040) at vl.c:4034
        i = 0
        snapshot = 0
        linux_boot = 0
        initrd_filename = 0x7fa9d0749eb0 "îkÅΩ\177"
        kernel_filename = 0x7fa9d0749ea0 ""
        kernel_cmdline = 0x7fa9cebd4e20 <__libc_csu_init> 
"AWA\211ÿAVI\211öAUI\211ÕATL\215%"
        boot_order = 0x0
        boot_once = 0x0
        ds = 0x7fa9cec56d38
        cyls = 0
        heads = 0
        secs = 0
        translation = 0
        hda_opts = 0x0
        opts = 0x7fa9d0790e90
        machine_opts = 0xfffffffe7fffffff
        icount_opts = 0x0
        olist = 0x7fa9cf03b140 <qemu_machine_opts>
        optind = 12
        optarg = 0x7fa9d0790f40 "none"
        loadvm = 0x0
        machine_class = 0x7fa9d077a160
        cpu_model = 0x0
        vga_model = 0x0
        qtest_chrdev = 0x0
        qtest_log = 0x0
        pid_file = 0x7fff03f8ff59 "/var/lib/libvirt/qemu/capabilities.pidfile"
        incoming = 0x0
        show_vnc_port = 0
        defconfig = true
        userconfig = false
        log_mask = 0x0
        log_file = 0x0
        mem_trace = {malloc = 0x7fa9ce9276a2 <malloc_and_trace>, realloc = 
0x7fa9ce9276d7 <realloc_and_trace>, free = 0x7fa9ce92771b <free_and_trace>, 
calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0}
        trace_events = 0x0
        trace_file = 0x0
        maxram_size = 134217728
        ram_slots = 0
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        __func__ = "main"

Reply via email to