On Mon, 05/23 14:54, Jason J. Herne wrote: > Using libvirt to migrate a guest and one guest disk that is using iothreads > causes Qemu to crash with the message: > Co-routine re-entered recursively > > I've looked into this one a bit but I have not seen anything that > immediately stands out. > Here is what I have found: > > In qemu_coroutine_enter: > if (co->caller) { > fprintf(stderr, "Co-routine re-entered recursively\n"); > abort(); > } > > The value of co->caller is actually changing between the time "if > (co->caller)" is evaluated and the time I print some debug statements > directly under the existing fprintf. I confirmed this by saving the value in > a local variable and printing both the new local variable and co->caller > immediately after the existing fprintf. This would certainly indicate some > kind of concurrency issue. However, it does not necessarily point to the > reason we ended up inside this if statement because co->caller was not NULL > before it was trashed. Perhaps it was trashed more than once then? I figured > maybe the problem was with coroutine pools so I disabled them > (--disable-coroutine-pool) and still hit the bug.
Which coroutine backend are you using? > > The backtrace is not always identical. Here is one instance: > (gdb) bt > #0 0x000003ffa78be2c0 in raise () from /lib64/libc.so.6 > #1 0x000003ffa78bfc26 in abort () from /lib64/libc.so.6 > #2 0x0000000080427d80 in qemu_coroutine_enter (co=0xa2cf2b40, opaque=0x0) > at /root/kvmdev/qemu/util/qemu-coroutine.c:112 > #3 0x000000008032246e in nbd_restart_write (opaque=0xa2d0cd40) at > /root/kvmdev/qemu/block/nbd-client.c:114 > #4 0x00000000802b3a1c in aio_dispatch (ctx=0xa2c907a0) at > /root/kvmdev/qemu/aio-posix.c:341 > #5 0x00000000802b4332 in aio_poll (ctx=0xa2c907a0, blocking=true) at > /root/kvmdev/qemu/aio-posix.c:479 > #6 0x0000000080155aba in iothread_run (opaque=0xa2c90260) at > /root/kvmdev/qemu/iothread.c:46 > #7 0x000003ffa7a87c2c in start_thread () from /lib64/libpthread.so.0 > #8 0x000003ffa798ec9a in thread_start () from /lib64/libc.so.6 It may be worth looking at backtrace of all threads especially the monitor thread (main thread). > > I've also noticed that co->entry sometimes (maybe always?) points to > mirror_run. Though, given that co->caller changes unexpectedly I don't know > if we can trust co->entry. > > I do not see the bug when I perform the same migration without migrating the > disk. > I also do not see the bug when I remove the iothread from the guest. > > I tested this scenario as far back as tag v2.4.0 and hit the bug every time. > I was unable to test v2.3.0 due to unresolved guest hangs. I did, however, > manage to get as far as this commit: > > commit ca96ac44dcd290566090b2435bc828fded356ad9 > Author: Stefan Hajnoczi <stefa...@redhat.com> > Date: Tue Jul 28 18:34:09 2015 +0200 > AioContext: force event loop iteration using BH > > This commit fixes a hang that my test scenario experiences. I was able to > test even further back by cherry-picking ca96ac44 on top of the earlier > commits but at this point I cannot be sure if the bug was introduced by > ca96ac44 so I stopped. > > I am willing to run tests or collect any info needed. I'll keep > investigating but I won't turn down any help :). > > Qemu command line as taken from Libvirt log: > qemu-system-s390x > -name kvm1 -S -machine s390-ccw-virtio-2.6,accel=kvm,usb=off > -m 6144 -realtime mlock=off > -smp 1,sockets=1,cores=1,threads=1 > -object iothread,id=iothread1 > -uuid 3796d9f0-8555-4a1e-9d5c-fac56b8cbf56 > -nographic -no-user-config -nodefaults > -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-kvm1/monitor.sock,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control > -rtc base=utc -no-shutdown > -boot strict=on -kernel /data/vms/kvm1/kvm1-image > -initrd /data/vms/kvm1/kvm1-initrd -append 'hvc_iucv=8 TERM=dumb' > -drive > file=/dev/disk/by-path/ccw-0.0.c22b,format=raw,if=none,id=drive-virtio-disk0,cache=none > -device > virtio-blk-ccw,scsi=off,devno=fe.0.0000,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 > -drive > file=/data/vms/kvm1/kvm1.qcow,format=qcow2,if=none,id=drive-virtio-disk1,cache=none > -device > virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0008,drive=drive-virtio-disk1,id=virtio-disk1 > -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 > -device > virtio-net-ccw,netdev=hostnet0,id=net0,mac=52:54:00:c9:86:2b,devno=fe.0.0001 > -chardev pty,id=charconsole0 -device > sclpconsole,chardev=charconsole0,id=console0 > -device virtio-balloon-ccw,id=balloon0,devno=fe.0.0002 -msg timestamp=on > > Libvirt migration command: > virsh migrate --live --persistent --copy-storage-all --migrate-disks vdb > kvm1 qemu+ssh://dev1/system > > -- > -- Jason J. Herne (jjhe...@linux.vnet.ibm.com) >