[Qemu-devel] qemu 2.0, deadlock in block-commit
Hi, I've encountered deadlock in qemu during some stress testing. The test is making snapshots, committing them and constantly quering for block job info. The version of QEMU is 2.0.0 rc3 (backtrace below says rc2, but it's manualy patched to rc3), but there seems to be no changes in block layer in final 2.0 (?). This is backtrace of qemu process: (gdb) thread apply all backtrace Thread 22 (Thread 0x7f6994852700 (LWP 13651)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699ab4c4eb in ?? () from /usr/lib64/librados.so.2 #2 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #3 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 21 (Thread 0x7f698700 (LWP 13652)): #0 0x7f69982f5ff1 in sem_timedwait () from /lib64/libpthread.so.0 #1 0x7f699ac3e1b8 in ?? () from /usr/lib64/librados.so.2 #2 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #3 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 20 (Thread 0x7f698f7fe700 (LWP 13653)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699ab7b383 in ?? () from /usr/lib64/librados.so.2 #2 0x7f699abe625d in ?? () from /usr/lib64/librados.so.2 #3 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #4 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 19 (Thread 0x7f698effd700 (LWP 13654)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699abe1c88 in ?? () from /usr/lib64/librados.so.2 #2 0x7f699abe6a6d in ?? () from /usr/lib64/librados.so.2 #3 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #4 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 18 (Thread 0x7f698e7fc700 (LWP 13655)): #0 0x7f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699aaeced8 in ?? () from /usr/lib64/librados.so.2 #2 0x7f699aaede0d in ?? () from /usr/lib64/librados.so.2 #3 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #4 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7f698dffb700 (LWP 13656)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699aaee862 in ?? () from /usr/lib64/librados.so.2 #2 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #3 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7f698d7fa700 (LWP 13657)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699abd288e in ?? () from /usr/lib64/librados.so.2 #2 0x7f699abddf1d in ?? () from /usr/lib64/librados.so.2 #3 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #4 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7f698d6f9700 (LWP 13658)): #0 0x7f699802007d in poll () from /lib64/libc.so.6 #1 0x7f699abc56ac in ?? () from /usr/lib64/librados.so.2 #2 0x7f699abc7460 in ?? () from /usr/lib64/librados.so.2 #3 0x7f699abd9c2c in ?? () from /usr/lib64/librados.so.2 #4 0x7f699abde03d in ?? () from /usr/lib64/librados.so.2 #5 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #6 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f698d5f8700 (LWP 13659)): #0 0x7f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699aaeced8 in ?? () from /usr/lib64/librados.so.2 #2 0x7f699aaede0d in ?? () from /usr/lib64/librados.so.2 #3 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #4 0x7f6998029dad in clone () from /lib64/libc.so.6 ---Type to continue, or q to quit--- Thread 13 (Thread 0x7f698cdf7700 (LWP 13660)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699aaee862 in ?? () from /usr/lib64/librados.so.2 #2 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #3 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f697700 (LWP 13661)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699aaee862 in ?? () from /usr/lib64/librados.so.2 #2 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #3 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f697f7fe700 (LWP 13662)): #0 0x7f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f699a5bc666 in ?? () from /usr/lib64/librbd.so.1 #2 0x7f699a5cf76d in ?? () from /usr/lib64/librbd.so.1 #3 0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0 #4 0x7f6998029dad in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f698c5f6700 (LWP 13663)): #0 0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpt
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
W dniu 2014-05-22 22:49, Marcin Gibuła pisze: Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)): #0 0x7f6998020286 in ppoll () from /lib64/libc.so.6 #1 0x7f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311 #3 0x7f699c0877e0 in aio_poll (ctx=0x7f699e4c9c00, blocking=blocking@entry=true) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221 #4 0x7f699c095c0a in bdrv_drain_all () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805 Some more info. VM was doing lot of write IO during this test. ppoll() is listening for these descriptors (from strace): ppoll([{fd=25, events=POLLIN|POLLERR|POLLHUP}, {fd=23, events=POLLIN|POLLERR|POLLHUP}, {fd=17, events=POLLIN|POLLERR|POLLHUP}, {fd=4, events=POLLIN|POLLERR|POLLHUP}], 4, NULL, NULL, 8, ...) fd # ls -l 25 23 17 4 lrwx-- 1 usr_5062 qemu 64 May 22 23:00 17 -> anon_inode:[eventfd] lrwx-- 1 usr_5062 qemu 64 May 22 23:00 23 -> anon_inode:[eventfd] lrwx-- 1 usr_5062 qemu 64 May 22 23:00 25 -> anon_inode:[eventfd] lrwx-- 1 usr_5062 qemu 64 May 22 23:00 4 -> anon_inode:[eventfd] VM is started via libvirt. No errors are reported in logs. Command line is: /usr/bin/qemu-system-x86_64 -machine accel=kvm -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536 -realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -smbios type=0,vendor=HAL 9000 -smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/cube2/5f751718-ff36-420f-b034-5f31230b5f23,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/dev/cube1/c5b7a6e3-11f8-4b08-ac3e-5ea054028221,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/mnt/nfs/volumes/66346c1b-add5-4412-89d9-b00a3bb13e75/72be1b50-982e-458a-9a84-c0fbd48b4b3c.qcow2,if=none,id=drive-virtio-disk2,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/mnt/nfs/volumes/a20c3b29-6f21-4b3d-a3fb-8b80599e50df/b84716ea-2564-47cc-bbbf-dea6029132b4.qcow2,if=none,id=drive-virtio-disk3,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk3,id=virtio-disk3 -drive file=/mnt/nfs/volumes/0c2996b5-abec-47ea-9e88-ebd7ebf0c79d/453cb20a-1705-45e2-9f9e-bc1ea096d52a.qcow2,if=none,id=drive-virtio-disk4,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk4,id=virtio-disk4 -drive file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/a8bb7e11-a9b5-4613-9b63-b9722fba2166.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk5,id=virtio-disk5 -drive file=rbd:iso-images/rescue.iso:auth_supported=none,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=19,id=hostnet0,vhost=on,vhostfd=20 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3 -netdev tap,fd=21,id=hostnet1,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:70:10:35,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.cloud.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=cha
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
I've encountered deadlock in qemu during some stress testing. The test is making snapshots, committing them and constantly quering for block job info. What is the exact command you used for triggering the block-commit? Was it via direct HMP or QMP, or indirect via libvirt? Via libvirt. Were you trying to commit the active layer? No. Commit was to intermediate file. I'm aware that libvirt does not support active layer commit yet. Plus, judging from backtrace, hang seems to be deep inside qemu. The VM is unresponsive after this. -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
On 23.05.2014 10:19, Paolo Bonzini wrote: Il 22/05/2014 23:05, Marcin Gibuła ha scritto: Some more info. VM was doing lot of write IO during this test. QEMU is waiting for librados to complete I/O. Can you reproduce it with a different driver? I'll try. However RBD is used only as read-only ISO - rbd:iso-images/rescue.iso:auth_supported=none,if=none,id=drive-ide0-0-0,readonly=on,format=raw) - what IO would it have to complete? -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
On 23.05.2014 10:19, Paolo Bonzini wrote: Il 22/05/2014 23:05, Marcin Gibuła ha scritto: Some more info. VM was doing lot of write IO during this test. QEMU is waiting for librados to complete I/O. Can you reproduce it with a different driver? Hi, I've reproduced it without RBD. Backtrace below: (gdb) thread apply all backtrace Thread 4 (Thread 0x7f9c8cccd700 (LWP 2017)): #0 0x7f9c907717a4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x7f9c9076d19c in _L_lock_518 () from /lib64/libpthread.so.0 #2 0x7f9c9076cfeb in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x7f9c947addf9 in qemu_mutex_lock (mutex=mutex@entry=0x7f9c95002660 ) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:76 #4 0x7f9c946b3a10 in qemu_mutex_lock_iothread () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:1043 #5 0x7f9c9470cf3d in kvm_cpu_exec (cpu=cpu@entry=0x7f9c968bf290) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1683 #6 0x7f9c946b271c in qemu_kvm_cpu_thread_fn (arg=0x7f9c968bf290) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873 #7 0x7f9c9076af3a in start_thread () from /lib64/libpthread.so.0 #8 0x7f9c904a4dad in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f9c87fff700 (LWP 2018)): #0 0x7f9c9049c897 in ioctl () from /lib64/libc.so.6 #1 0x7f9c9470cdf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f9c968fa300, type=type@entry=44672) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796 #2 0x7f9c9470cf35 in kvm_cpu_exec (cpu=cpu@entry=0x7f9c968fa300) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681 #3 0x7f9c946b271c in qemu_kvm_cpu_thread_fn (arg=0x7f9c968fa300) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873 #4 0x7f9c9076af3a in start_thread () from /lib64/libpthread.so.0 #5 0x7f9c904a4dad in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f9c869ff700 (LWP 2020)): #0 0x7f9c9076ed0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f9c947ae019 in qemu_cond_wait (cond=cond@entry=0x7f9c9695a250, mutex=mutex@entry=0x7f9c9695a280) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:135 #2 0x7f9c946a270b in vnc_worker_thread_loop (queue=queue@entry=0x7f9c9695a250) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:222 #3 0x7f9c946a2ae0 in vnc_worker_thread (arg=0x7f9c9695a250) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:323 #4 0x7f9c9076af3a in start_thread () from /lib64/libpthread.so.0 #5 0x7f9c904a4dad in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f9c94448900 (LWP 2013)): #0 0x7f9c9049b286 in ppoll () from /lib64/libc.so.6 #1 0x7f9c9466ed9b in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311 #3 0x7f9c945027e0 in aio_poll (ctx=0x7f9c95d5bc00, blocking=blocking@entry=true) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221 #4 0x7f9c94510c0a in bdrv_drain_all () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805 #5 0x7f9c9451787e in bdrv_close (bs=bs@entry=0x7f9c969b7d90) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695 #6 0x7f9c945175fa in bdrv_delete (bs=0x7f9c969b7d90) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978 #7 bdrv_unref (bs=0x7f9c969b7d90) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198 #8 0x7f9c94517812 in bdrv_drop_intermediate (active=active@entry=0x7f9c9648f490, top=top@entry=0x7f9c969b7d90, base=base@entry=0x7f9c96756500) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567 #9 0x7f9c9451c963 in commit_run (opaque=0x7f9c96a1e280) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144 #10 0x7f9c9455bdca in coroutine_trampoline (i0=, i1=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118 #11 0x7f9c904009f0 in ?? () from /lib64/libc.so.6 #12 0x7fffe4bcfee0 in ?? () #13 0x in ?? () I still have this process running (hanging ;) ) if you need any more info. I also have no problems with reproducing it. -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
I see that you have a mix of aio=native and aio=threads. I can't say much about the aio=native disks (perhaps try to reproduce without them?), but there are definitely no worker threads for the other disks that bdrv_drain_all() would have to wait for. True. But I/O was being done only qcow2 disk with threads backend. And snapshot was made on this disk. I'll try to reproduce with all 'threads'. bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the function that determines for each of the disks in your VM if it still has requests in flight that need to be completed. This function must have returned true even though there is nothing to wait for. Can you check which of its conditions led to this behaviour, and for which disk it did? Either by setting a breakpoint there and singlestepping through the function the next time it is called (if the poll even has a timeout), or by inspecting the conditions manually in gdb. I'm on it. -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the function that determines for each of the disks in your VM if it still has requests in flight that need to be completed. This function must have returned true even though there is nothing to wait for. Can you check which of its conditions led to this behaviour, and for which disk it did? Either by setting a breakpoint there and singlestepping through the function the next time it is called (if the poll even has a timeout), or by inspecting the conditions manually in gdb. The condition that is true is: if (!QLIST_EMPTY(&bs->tracked_requests)) and it's returned for intermediate qcow2 which is being commited. -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
The condition that is true is: if (!QLIST_EMPTY(&bs->tracked_requests)) and it's returned for intermediate qcow2 which is being commited. Btw - it's also disk that is being pounded with writes during commit. -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
If you see a pending request on a RADOS block device (rbd) then it would be good to dig deeper into QEMU's block/rbd.c driver to see why it's not completing that request. Are you using qcow2 on top of rbd? Hi, I've already recreated this without rbd and with stock qemu 2.0. -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
W dniu 2014-05-23 15:14, Marcin Gibuła pisze: bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the function that determines for each of the disks in your VM if it still has requests in flight that need to be completed. This function must have returned true even though there is nothing to wait for. Can you check which of its conditions led to this behaviour, and for which disk it did? Either by setting a breakpoint there and singlestepping through the function the next time it is called (if the poll even has a timeout), or by inspecting the conditions manually in gdb. The condition that is true is: if (!QLIST_EMPTY(&bs->tracked_requests)) and it's returned for intermediate qcow2 which is being commited. My mistake, this condition is true not for intermediate file, but for an active one. Sorry for confusion. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Does anybody know why the APIC state loaded by the first call to kvm_arch_get_registers() is wrong, in the first place? What exactly is different in the APIC state in the second kvm_arch_get_registers() call, and when/why does it change? If cpu_synchronize_state() does the wrong thing if it is called at the wrong moment, then we may have other hidden bugs, because the user can trigger cpu_synchronize_all_states() calls arbitrarily using monitor commands. My guess is, it's not wrong, it's just outdated when second call occures. Maybe it's an ordering issue - could kvmclock state change handler be called before other activity is suspended (?) I didn't pursue it further, cause I don't know too much (anything really) about QEMU/APIC internals and how to track its changes. -- mg
Re: [Qemu-devel] [PATCH uq/master] kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation
@@ -65,6 +66,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time)); +assert(time.tsc_timestamp <= migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift < 0) { delta >>= -time.tsc_shift; @@ -123,6 +125,8 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s->clock_valid) { return; } + +cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); if (ret < 0) { fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); This causes a hang during migration, so I'll revert the patch from 2.1. For me this patch series fixed all hangs I had with migration (at least with qemu 2.0). -- mg
Re: [Qemu-devel] [PATCH v2 0/2] thread-pool: avoid fd usage and fix nested aio_poll() deadlock
W dniu 2014-07-15 17:17, Paolo Bonzini pisze: Il 15/07/2014 16:44, Stefan Hajnoczi ha scritto: v2: * Leave BH scheduled so that the code can be simplified [Paolo] These patches convert thread-pool.c from EventNotifier to QEMUBH. They then solve the deadlock when nested aio_poll() calls are made. Please speak out whether you want this in QEMU 2.1 or not. I'm not aware of the nested aio_poll() deadlock ever having been reported, so maybe we can defer to QEMU 2.2. It was reported as a hang in block_commit. Marcin, can you please test these patches? I try to test it tomorrow. The same hang was also in linux-aio however (I was able to reproduce it). -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Andrey, Can you please provide instructions on how to create reproducible environment? The following patch is equivalent to the original patch, for the purposes of fixing the kvmclock problem. Perhaps it becomes easier to spot the reason for the hang you are experiencing. Marcelo, the original reason for patch adding cpu_synchronize_all_states() there was because this bug affected non-migration operations as well - http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html. Won't moving it only to migration code break these things again? diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 272a88a..feb5fc5 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -17,7 +17,6 @@ #include "qemu/host-utils.h" #include "sysemu/sysemu.h" #include "sysemu/kvm.h" -#include "sysemu/cpus.h" #include "hw/sysbus.h" #include "hw/kvm/clock.h" @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s) cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time)); -assert(time.tsc_timestamp <= migration_tsc); delta = migration_tsc - time.tsc_timestamp; if (time.tsc_shift < 0) { delta >>= -time.tsc_shift; @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (s->clock_valid) { return; } - -cpu_synchronize_all_states(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); if (ret < 0) { fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); diff --git a/migration.c b/migration.c index 8d675b3..34f2325 100644 --- a/migration.c +++ b/migration.c @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); +cpu_synchronize_all_states(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret >= 0) { qemu_file_set_rate_limit(s->file, INT64_MAX); -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Tested on iscsi pool, though there are no-cache requirement and rbd with disabled cache may survive one migration but iscsi backend hangs always. As it was before, just rolling back problematic commit fixes the problem and adding cpu_synchronize_all_states to migration.c has no difference at a glance in a VM` behavior. The problem consist at least two separate ones: the current hang and behavior with the unreverted patch from agraf - last one causes live migration with writeback cache to fail, cache=none works well in any variant which survives first condition. Marcin, would you mind to check the current state of the problem on your environments in a spare time? It is probably easier to reproduce on iscsi because of way smaller time needed to set it up, command line and libvirt config attached (v2.1.0-rc2 plus iscsi-1.11.0). Ok, but what exacly do you want me to test? Just to avoid any confusion, originally there were two problems with kvmclock: 1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when clock drift (?) caused kvmclock in guest to report time in past which caused guest kernel to hang. This is hard to reproduce reliably (probably as it requires long time for drift to accumulate). 2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression caused by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during non-migration operations (drive-mirror + pivot), which also caused guest kernel to hang. This is trival to reproduce. I'm using both of them applied on top of 2.0 in production and have no problems with them. I'm using NFS exclusively with cache=none. So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no extra patches applied or reverted, on VM that is running fio, am I correct? -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
I'm using both of them applied on top of 2.0 in production and have no problems with them. I'm using NFS exclusively with cache=none. So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no extra patches applied or reverted, on VM that is running fio, am I correct? Yes, exactly. ISCSI-based setup can take some minutes to deploy, given prepared image, and I have one hundred percent hit rate for the original issue with it. I've reproduced your IO hang with 2.0 and both 9b1786829aefb83f37a8f3135e3ea91c56001b56 and a096b3a6732f846ec57dc28b47ee9435aa0609bf applied. Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the problem (but reintroduces block-migration hang). It's seems like qemu bug rather than guest problem, as no-kvmclock parameters makes no difference. IO just stops, all qemu IO threads die off. Almost like it forgets to migrate them:-) I'm attaching backtrace from guest kernel and qemu and qemu command line. Going to compile 2.1-rc. -- mg [ 254.634525] SysRq : Show Blocked State [ 254.635041] taskPC stack pid father [ 254.635304] kworker/0:2 D 88013fc145c0 083 2 0x [ 254.635304] Workqueue: xfs-log/vdb xfs_log_worker [xfs] [ 254.635304] 880136bdfa58 0046 880136bdffd8 000145c0 [ 254.635304] 880136bdffd8 000145c0 880136ad8000 88013fc14e88 [ 254.635304] 880037bd4380 880037bc5068 880037bd43b0 880037bd4380 [ 254.635304] Call Trace: [ 254.635304] [] io_schedule+0x9d/0x140 [ 254.635304] [] get_request+0x1b5/0x790 [ 254.635304] [] ? wake_up_bit+0x30/0x30 [ 254.635304] [] blk_queue_bio+0x96/0x390 [ 254.635304] [] generic_make_request+0xe2/0x130 [ 254.635304] [] submit_bio+0x71/0x150 [ 254.635304] [] ? bio_alloc_bioset+0x1e8/0x2e0 [ 254.635304] [] _xfs_buf_ioapply+0x2bb/0x3d0 [xfs] [ 254.635304] [] ? xlog_bdstrat+0x1f/0x50 [xfs] [ 254.635304] [] xfs_buf_iorequest+0x46/0xa0 [xfs] [ 254.635304] [] xlog_bdstrat+0x1f/0x50 [xfs] [ 254.635304] [] xlog_sync+0x265/0x450 [xfs] [ 254.635304] [] xlog_state_release_iclog+0x92/0xb0 [xfs] [ 254.635304] [] _xfs_log_force+0x15a/0x290 [xfs] [ 254.635304] [] ? __switch_to+0x136/0x490 [ 254.635304] [] xfs_log_force+0x26/0x80 [xfs] [ 254.635304] [] xfs_log_worker+0x24/0x50 [xfs] [ 254.635304] [] process_one_work+0x17b/0x460 [ 254.635304] [] worker_thread+0x11b/0x400 [ 254.635304] [] ? rescuer_thread+0x400/0x400 [ 254.635304] [] kthread+0xcf/0xe0 [ 254.635304] [] ? kthread_create_on_node+0x140/0x140 [ 254.635304] [] ret_from_fork+0x7c/0xb0 [ 254.635304] [] ? kthread_create_on_node+0x140/0x140 [ 254.635304] fio D 88013fc145c0 0 772770 0x [ 254.635304] 8800bba4b8c8 0082 8800bba4bfd8 000145c0 [ 254.635304] 8800bba4bfd8 000145c0 8801376ff1c0 88013fc14e88 [ 254.635304] 880037bd4380 880037baba90 880037bd43b0 880037bd4380 [ 254.635304] Call Trace: [ 254.635304] [] io_schedule+0x9d/0x140 [ 254.635304] [] get_request+0x1b5/0x790 [ 254.635304] [] ? wake_up_bit+0x30/0x30 [ 254.635304] [] blk_queue_bio+0x96/0x390 [ 254.635304] [] generic_make_request+0xe2/0x130 [ 254.635304] [] submit_bio+0x71/0x150 [ 254.635304] [] do_blockdev_direct_IO+0x14bc/0x2620 [ 254.635304] [] ? xfs_get_blocks+0x20/0x20 [xfs] [ 254.635304] [] __blockdev_direct_IO+0x55/0x60 [ 254.635304] [] ? xfs_get_blocks+0x20/0x20 [xfs] [ 254.635304] [] xfs_vm_direct_IO+0x15c/0x180 [xfs] [ 254.635304] [] ? xfs_get_blocks+0x20/0x20 [xfs] [ 254.635304] [] generic_file_aio_read+0x6d3/0x750 [ 254.635304] [] ? ktime_get_ts+0x48/0xe0 [ 254.635304] [] ? delayacct_end+0x8f/0xb0 [ 254.635304] [] ? down_read+0x12/0x30 [ 254.635304] [] xfs_file_aio_read+0x154/0x2e0 [xfs] [ 254.635304] [] ? xfs_file_splice_read+0x140/0x140 [xfs] [ 254.635304] [] do_io_submit+0x3b8/0x840 [ 254.635304] [] SyS_io_submit+0x10/0x20 [ 254.635304] [] system_call_fastpath+0x16/0x1b Thread 3 (Thread 0x7f4250f50700 (LWP 11955)): #0 0x7f4253d1a897 in ioctl () from /lib64/libc.so.6 #1 0x7f4257f8adf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f4258e2aa90, type=type@entry=44672) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796 #2 0x7f4257f8af35 in kvm_cpu_exec (cpu=cpu@entry=0x7f4258e2aa90) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681 #3 0x7f4257f3071c in qemu_kvm_cpu_thread_fn (arg=0x7f4258e2aa90) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873 #4 0x7f4253fe8f3a in start_thread () from /lib64/libpthread.so.0 #5 0x7f4253d22dad in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f424b5ff700 (LWP 11957)): #0 0x7f4253fecd0c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7f425802c019 in qemu_cond_wait (cond=cond@entry=0x7f4258f0cf
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
I've reproduced your IO hang with 2.0 and both 9b1786829aefb83f37a8f3135e3ea91c56001b56 and a096b3a6732f846ec57dc28b47ee9435aa0609bf applied. Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the problem (but reintroduces block-migration hang). It's seems like qemu bug rather than guest problem, as no-kvmclock parameters makes no difference. IO just stops, all qemu IO threads die off. Almost like it forgets to migrate them:-) Some more info: a) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 + a096b3a6732f846ec57dc28b47ee9435aa0609bf = hangs b) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 = works c) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 + move cpu_synchronize_state to migration.c = works Tested with NFS (qcow2) + cache=none. IO is dead only for disk that was being written to during migration. I.e. if my test VM has two disks: vda and vdb, and I'm running fio on vdb and it hangs after migration, I can still issue writes to vda. Recreation steps: 1. Create VM 2. Run fio (Andrey's config) 3. Live migrate VM couple of times. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Yes, exactly. ISCSI-based setup can take some minutes to deploy, given prepared image, and I have one hundred percent hit rate for the original issue with it. I've reproduced your IO hang with 2.0 and both 9b1786829aefb83f37a8f3135e3ea91c56001b56 and a096b3a6732f846ec57dc28b47ee9435aa0609bf applied. Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the problem (but reintroduces block-migration hang). It's seems like qemu bug rather than guest problem, as no-kvmclock parameters makes no difference. IO just stops, all qemu IO threads die off. Almost like it forgets to migrate them:-) I'm attaching backtrace from guest kernel and qemu and qemu command line. Going to compile 2.1-rc. 2.1-rc2 behaves exactly the same. Interestingly enough, reseting guest system causes I/O to work again. So it's not qemu that hangs on IO, rather it fails to notify guest about completed operations that were issued during migration. And its somehow caused by calling cpu_synchronize_all_states() inside kvmclock_vm_state_change(). As for testing with cache=writeback, I'll try to setup some iscsi to test it. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
2.1-rc2 behaves exactly the same. Interestingly enough, reseting guest system causes I/O to work again. So it's not qemu that hangs on IO, rather it fails to notify guest about completed operations that were issued during migration. And its somehow caused by calling cpu_synchronize_all_states() inside kvmclock_vm_state_change(). As for testing with cache=writeback, I'll try to setup some iscsi to test it. Awesome, thanks! AFAIK you`ll not be able to use write cache with iscsi for migration. VM which had a reset before hangs always when freshly launched have a chance to be migrated successfully. And yes, it looks like lower layer forgetting to notify driver about some operations at a glance. Andrey, could you try attached patch? It's an incredibly ugly workaround that calls cpu_synchronize_all_states() in a way that bypasses lazy execution logic. But it works for me. If that works for you as well, its somehow related to lazy execution of cpu_synchronize_all_states. -- mg diff -ru qemu-2.1.0-rc2/cpus.c qemu-2.1.0-rc2-fixed/cpus.c --- qemu-2.1.0-rc2/cpus.c 2014-07-15 23:49:14.0 +0200 +++ qemu-2.1.0-rc2-fixed/cpus.c 2014-07-17 15:09:09.306696284 +0200 @@ -505,6 +505,15 @@ } } +void cpu_synchronize_all_states_always(void) +{ +CPUState *cpu; + +CPU_FOREACH(cpu) { +cpu_synchronize_state_always(cpu); +} +} + void cpu_synchronize_all_post_reset(void) { CPUState *cpu; diff -ru qemu-2.1.0-rc2/hw/i386/kvm/clock.c qemu-2.1.0-rc2-fixed/hw/i386/kvm/clock.c --- qemu-2.1.0-rc2/hw/i386/kvm/clock.c 2014-07-15 23:49:14.0 +0200 +++ qemu-2.1.0-rc2-fixed/hw/i386/kvm/clock.c 2014-07-17 15:08:25.627063756 +0200 @@ -126,7 +126,7 @@ return; } -cpu_synchronize_all_states(); +cpu_synchronize_all_states_always(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); if (ret < 0) { fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); diff -ru qemu-2.1.0-rc2/include/sysemu/cpus.h qemu-2.1.0-rc2-fixed/include/sysemu/cpus.h --- qemu-2.1.0-rc2/include/sysemu/cpus.h 2014-07-15 23:49:14.0 +0200 +++ qemu-2.1.0-rc2-fixed/include/sysemu/cpus.h 2014-07-17 15:09:23.256578916 +0200 @@ -7,6 +7,7 @@ void pause_all_vcpus(void); void cpu_stop_current(void); +void cpu_synchronize_all_states_always(void); void cpu_synchronize_all_states(void); void cpu_synchronize_all_post_reset(void); void cpu_synchronize_all_post_init(void); diff -ru qemu-2.1.0-rc2/include/sysemu/kvm.h qemu-2.1.0-rc2-fixed/include/sysemu/kvm.h --- qemu-2.1.0-rc2/include/sysemu/kvm.h 2014-07-15 23:49:14.0 +0200 +++ qemu-2.1.0-rc2-fixed/include/sysemu/kvm.h 2014-07-17 15:11:54.855303171 +0200 @@ -346,9 +346,11 @@ #endif /* NEED_CPU_H */ void kvm_cpu_synchronize_state(CPUState *cpu); +void kvm_cpu_synchronize_state_always(CPUState *cpu); void kvm_cpu_synchronize_post_reset(CPUState *cpu); void kvm_cpu_synchronize_post_init(CPUState *cpu); + /* generic hooks - to be moved/refactored once there are more users */ static inline void cpu_synchronize_state(CPUState *cpu) @@ -358,6 +360,13 @@ } } +static inline void cpu_synchronize_state_always(CPUState *cpu) +{ +if (kvm_enabled()) { +kvm_cpu_synchronize_state_always(cpu); +} +} + static inline void cpu_synchronize_post_reset(CPUState *cpu) { if (kvm_enabled()) { diff -ru qemu-2.1.0-rc2/kvm-all.c qemu-2.1.0-rc2-fixed/kvm-all.c --- qemu-2.1.0-rc2/kvm-all.c 2014-07-15 23:49:14.0 +0200 +++ qemu-2.1.0-rc2-fixed/kvm-all.c 2014-07-17 15:14:04.884208826 +0200 @@ -1652,6 +1652,13 @@ s->coalesced_flush_in_progress = false; } +static void do_kvm_cpu_synchronize_state_always(void *arg) +{ +CPUState *cpu = arg; + +kvm_arch_get_registers(cpu); +} + static void do_kvm_cpu_synchronize_state(void *arg) { CPUState *cpu = arg; @@ -1669,6 +1676,11 @@ } } +void kvm_cpu_synchronize_state_always(CPUState *cpu) +{ +run_on_cpu(cpu, do_kvm_cpu_synchronize_state_always, cpu); +} + void kvm_cpu_synchronize_post_reset(CPUState *cpu) { kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
W dniu 2014-07-17 21:18, Dr. David Alan Gilbert pisze: I don't know if this is the same case, but Gerd showed me a migration failure that might be related. 2.0 seems OK, 2.1-rc0 is broken (and I've not found another working point in between yet). The test case involves booting a fedora livecd (using an IDE CDROM device) and after the migration we're seeing squashfs errors and stuff gently falling apart. Perhaps you could try testing workaround patch I sent earlier? It's not proposal for inclusion, just a test patch that seems to fix IO hang for me. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
could you try attached patch? It's an incredibly ugly workaround that calls cpu_synchronize_all_states() in a way that bypasses lazy execution logic. But it works for me. If that works for you as well, its somehow related to lazy execution of cpu_synchronize_all_states. -- mg Yes, it working well with writeback cache too. Does it fix problem with libvirt migration timing out for you as well? -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Does it fix problem with libvirt migration timing out for you as well? Oh, forgot to mention - yes, all migration-related problems are fixed. Though release right now in a freeze phase, I`d like to ask maintainers to consider possibility of fixing the problem on top of the current tree instead of just rolling back problematic snippet. Paolo, if patch in its current form is not acceptable for you for inclusion, I'll try rewrite it according to your comments. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
The name of the hack^Wfunction is tricky, because compared to do_kvm_cpu_synchronize_state there are three things you change: 1) you always synchronize the state 2) the next call to do_kvm_cpu_synchronize_state will do kvm_arch_get_registers Yes. 3) the next CPU entry will call kvm_arch_put_registers: if (cpu->kvm_vcpu_dirty) { kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE); cpu->kvm_vcpu_dirty = false; } But, I don't set cpu->kvm_vcpu_dirty anywhere (?). I still lean very much towards reverting the patches now. We can reapply them, fixed, in 2.1.1. That's probably good idea. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
W dniu 2014-07-18 11:37, Paolo Bonzini pisze: Il 18/07/2014 11:32, Marcin Gibuła ha scritto: 3) the next CPU entry will call kvm_arch_put_registers: if (cpu->kvm_vcpu_dirty) { kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE); cpu->kvm_vcpu_dirty = false; } But, I don't set cpu->kvm_vcpu_dirty anywhere (?). Yeah, the next CPU entry will *not* call kvm_arch_put_registers with your change. It will call it with vanilla cpu_synchronize_all_states(). That's because in kvmclock, it's used only to read cpu registers, not edit them. Now, because making this call "invisible" makes it work, I'm speculating that following happens: [migration starts] kvmclock: calls cpu_synchronize_all_states() somewhere in qemu: completes IO somewhere in qemu: calls cpu_synchronize_all_states() <- old state Is it (or something similar) possible? I didn't dig deep enough into internals yet, but perhaps you could point if thats the right direction? -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
On 29.07.2014 18:58, Paolo Bonzini wrote: Il 18/07/2014 10:48, Paolo Bonzini ha scritto: It is easy to find out if the "fix" is related to 1 or 2/3: just write if (cpu->kvm_vcpu_dirty) { printf ("do_kvm_cpu_synchronize_state_always: look at 2/3\n"); kvm_arch_get_registers(cpu); } else { printf ("do_kvm_cpu_synchronize_state_always: look at 1\n"); } To further refine between 2 and 3, I suppose you can set a breakpoint on cpu_synchronize_all_states and kvm_cpu_exec, and see which is called first after cpu_synchronize_all_states_always. Marcin, have you ever gotten round to doing this? Source side of migration, without my ugly hack: called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers called kvm_cpu_synchronize_state: vcpu dirty called kvm_cpu_synchronize_state: vcpu dirty shutting down without it: called do_kvm_cpu_synchronize_state_always called do_kvm_cpu_synchronize_state_always called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers shutting down So it's probably about 2 from your list ("the next call to do_kvm_cpu_synchronize_state will do kvm_arch_get_registers"). I've tapped into kvm_cpu_exec() to find out if it's kvm_arch_put_registers(), but nothing was logged during migration so it's probably not 3. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
W dniu 2014-07-30 15:38, Paolo Bonzini pisze: Il 30/07/2014 14:02, Marcin Gibuła ha scritto: without it: s/without/with/ of course... called do_kvm_cpu_synchronize_state_always called do_kvm_cpu_synchronize_state_always called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers shutting down So it's probably about 2 from your list ("the next call to do_kvm_cpu_synchronize_state will do kvm_arch_get_registers"). Can you dump *env before and after the call to kvm_arch_get_registers? Yes, but it seems they are equal - I used memcmp() to compare them. Is there any other side effect that cpu_synchronize_all_states() may have? The second caller of this function is qemu_savevm_state_complete(). -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
Can you dump *env before and after the call to kvm_arch_get_registers? Yes, but it seems they are equal - I used memcmp() to compare them. Is there any other side effect that cpu_synchronize_all_states() may have? I think I found it. The reason for hang is, because when second call to kvm_arch_get_registers() is skipped, it also skips kvm_get_apic() which updates cpu->apic_state. -- mg
Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls
W dniu 2014-08-04 17:56, Stefan Hajnoczi pisze: If two Linux AIO request completions are fetched in the same io_getevents() call, QEMU will deadlock if request A's callback waits for request B to complete using an aio_poll() loop. This was reported to happen with the mirror blockjob. s/mirror/commit/ This patch moves completion processing into a BH and makes it resumable. Nested event loops can resume completion processing so that request B will complete and the deadlock will not occur. Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Ming Lei Cc: Marcin Gibuła Reported-by: Marcin Gibuła Signed-off-by: Stefan Hajnoczi I'll test it tomorrow. -- mg
Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration
W dniu 2014-07-31 13:27, Marcin Gibuła pisze: Can you dump *env before and after the call to kvm_arch_get_registers? Yes, but it seems they are equal - I used memcmp() to compare them. Is there any other side effect that cpu_synchronize_all_states() may have? I think I found it. The reason for hang is, because when second call to kvm_arch_get_registers() is skipped, it also skips kvm_get_apic() which updates cpu->apic_state. Paolo, is this analysis deep enough for you? I don't know if that can be fixed with existing api as cpu_synchronize_all_states() is all or nothing kind of stuff. Kvmclock needs it only to read current cpu registers, so syncing everything is not really necessary. Perhaps exporting one of kvm_arch_get_* would be enough. And it wouldn't mess with lazy get/put. On the other hand, if in future any other driver adds cpu_synchronize_all_states() in its change state callback it could result in same error so perhaps more generic approach is needed. -- mg
Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls
On 04.08.2014 17:56, Stefan Hajnoczi wrote: If two Linux AIO request completions are fetched in the same io_getevents() call, QEMU will deadlock if request A's callback waits for request B to complete using an aio_poll() loop. This was reported to happen with the mirror blockjob. This patch moves completion processing into a BH and makes it resumable. Nested event loops can resume completion processing so that request B will complete and the deadlock will not occur. Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Ming Lei Cc: Marcin Gibuła Reported-by: Marcin Gibuła Signed-off-by: Stefan Hajnoczi Still hangs... Backtrace still looks like this: Thread 1 (Thread 0x7f3d5313a900 (LWP 17440)): #0 0x7f3d4f38f286 in ppoll () from /lib64/libc.so.6 #1 0x7f3d5347465b in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=) at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/qemu-timer.c:314 #3 0x7f3d53475970 in aio_poll (ctx=ctx@entry=0x7f3d54270c00, blocking=blocking@entry=true) at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/aio-posix.c:250 #4 0x7f3d534695e7 in bdrv_drain_all () at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:1924 #5 0x7f3d5346fe1f in bdrv_close (bs=bs@entry=0x7f3d5579b340) at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:1820 #6 0x7f3d53470047 in bdrv_delete (bs=0x7f3d5579b340) at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:2094 #7 bdrv_unref (bs=0x7f3d5579b340) at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:5376 #8 0x7f3d5347030b in bdrv_drop_intermediate (active=active@entry=0x7f3d54635e20, top=top@entry=0x7f3d5579b340, base=base@entry=0x7f3d54d956b0, backing_file_str=0x7f3d54d95700 "/mnt/nfs/volumes/7c13c27f-0c48-4676-b075-6e8a3325383e/3785abe6-d2df-49da-9cba-e15cfce8e2af.qcow2") at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:2643 #9 0x7f3d5335121a in commit_run (opaque=0x7f3d545cdac0) at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block/commit.c:145 #10 0x7f3d5347ebca in coroutine_trampoline (i0=, i1=) at /var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/coroutine-ucontext.c:118 #11 0x7f3d4f2f49f0 in ?? () from /lib64/libc.so.6 #12 0x7fff27d5ef50 in ?? () #13 0x in ?? () -- mg
Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls
On 05.08.2014 16:26, Marcin Gibuła wrote: On 04.08.2014 17:56, Stefan Hajnoczi wrote: If two Linux AIO request completions are fetched in the same io_getevents() call, QEMU will deadlock if request A's callback waits for request B to complete using an aio_poll() loop. This was reported to happen with the mirror blockjob. This patch moves completion processing into a BH and makes it resumable. Nested event loops can resume completion processing so that request B will complete and the deadlock will not occur. Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Ming Lei Cc: Marcin Gibuła Reported-by: Marcin Gibuła Signed-off-by: Stefan Hajnoczi Still hangs... I'm sorry, ignore this comment. I've built my test qemu without aio support. Retesting now. -- mg
Re: [Qemu-devel] [PATCH v2 0/2] thread-pool: avoid fd usage and fix nested aio_poll() deadlock
On 15.07.2014 17:17, Paolo Bonzini wrote: Il 15/07/2014 16:44, Stefan Hajnoczi ha scritto: v2: * Leave BH scheduled so that the code can be simplified [Paolo] These patches convert thread-pool.c from EventNotifier to QEMUBH. They then solve the deadlock when nested aio_poll() calls are made. Please speak out whether you want this in QEMU 2.1 or not. I'm not aware of the nested aio_poll() deadlock ever having been reported, so maybe we can defer to QEMU 2.2. It was reported as a hang in block_commit. Marcin, can you please test these patches? Sorry for late answer - yes, it seems to fix block_commit hang when using thread-pool. -- mg
Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls
W dniu 2014-08-04 17:56, Stefan Hajnoczi pisze: If two Linux AIO request completions are fetched in the same io_getevents() call, QEMU will deadlock if request A's callback waits for request B to complete using an aio_poll() loop. This was reported to happen with the mirror blockjob. This patch moves completion processing into a BH and makes it resumable. Nested event loops can resume completion processing so that request B will complete and the deadlock will not occur. Cc: Kevin Wolf Cc: Paolo Bonzini Cc: Ming Lei Cc: Marcin Gibuła Reported-by: Marcin Gibuła Signed-off-by: Stefan Hajnoczi This patch fixes the block-commit hang when using linux-aio, so: Tested-by: Marcin Gibuła -- mg
Re: [Qemu-devel] Unresponsive linux guest once migrated
W dniu 2014-03-27 23:52, Chris Dunlop pisze: Hi, I have a problem where I migrate a linux guest VM, and on the receiving side the guest goes to 100% cpu as seen by the host, and the guest itself is unresponsive, e.g. not responding to ping etc. The only way out I've found is to destroy the guest. This seems to only happen if the guest has been idle for an extended period (e.g. overnight). I've migrated the guest 100 times in a row without any problems when the guest has been used "a little" (e.g. logging in and looking around, it's not doing anything normally). Hi, I've seen very similar problem on our installation. Have you tried to run with kvm-clock explicitly disabled (either via no-kvmclock in guest kernel or with -kvm-clock in qemu) ? -- mg
Re: [Qemu-devel] Unresponsive linux guest once migrated
I've seen very similar problem on our installation. Have you tried to run with kvm-clock explicitly disabled (either via no-kvmclock in guest kernel or with -kvm-clock in qemu) ? No, I haven't tried it yet (I've confirmed kvm-clock is currently being used). I'll have a look at it. Did it help your issue? My results were inconclusive, but there way a guy two months ago who had the same problem and disabling kvm-clock resolved this for him. I wonder if it'll help you as well. -- mg
Re: [Qemu-devel] Unresponsive linux guest once migrated
It's looking good so far, after a few migrations (it takes a while to test because I'm waiting at least 5 hours between migrations). I'll be happier once I've done a couple of weeks of this without any failures! Does anyone have any hints how to debug this thing? :( I've tried to put hanged guest under gdb and found it's looped deep inside kernel time management functions. Disabling kvmclock suggests it is somehow related to its corruption during migration. It happens on both old and new versions of guest kernels. Any hints from developers are welcome:) -- mg
Re: [Qemu-devel] Unresponsive linux guest once migrated
Can you give: 1) A backtrace from the guest thread apply all bt full in gdb You mean from gdb attached to hanged guest? I'll try to get it. From what I remember it looks rather "normal" - busy executing guest code. 2) What's the earliest/newest qemu versions you've seen this on? 1.4 - 1.6 Don't know about earlier versions because I didn't use migration on them. Haven't tried 1.7 yet (I know about XBZRLE fixes, but it happened without it as well...). 3) What guest OS are you running? All flavors of Centos, Ubuntu, Redhat, etc. Also Windows. But never seen a crash with Windows so far. Seems that few people who also have this issue, reports success with kvmclock disabled (either in qemu or kernel command line). 4) What host OS are you running? Distro is Gentoo based (with no crazy compiler options). I've been using kernel 3.4 - 3.10. 5) What CPU are you running on? AMD Opteron(tm) Processor 6164 HE 6) What does your qemu command line look like? Example VM: /usr/bin/qemu-system-x86_64 -machine accel=kvm -name 3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -S -machine pc-i440fx-1.5,accel=kvm,usb=off -cpu qemu64,+misalignsse,+abm,+lahf_lm,+rdtscp,+popcnt,+x2apic,-svm,+kvmclock -m 1024 -realtime mlock=on -smp 2,sockets=4,cores=12,threads=1 -uuid 3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -smbios type=0,vendor=HAL 9000 -smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/3b5e37ea-04be-4a6b-8d63-f1a5853f2138.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -no-hpet -no-kvm-pit-reinjection -no-shutdown -boot menu=off -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/dev/stor1c/2e7fd7aa-8588-47ed-a091-af2b81c9e935,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:11:11:11:11,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/f16x86_64.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 -device usb-tablet,id=input0 -vnc 0.0.0.0:4,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox on I've tried playing with different CPU model (Opteron_G3) and flags, it didn't make any difference. 7) How exactly are you migrating? Via libvirt live migration. Seen it with and without XBZRLE enabled. 8) You talk about having to wait a few hours to trigger it - do you have a more exact description of a test? Yes, that's where it gets weird. I've never seen this on fresh VM. It needs to be idle for couple of hours at least. And even then it doesn't always hang. 9) Is there any output from qemu stderr/stdout in your qemu logs? Nothing unusual. From QEMU point of view guest is up and running. Only its OS is hanged (but not panicked, there is no backtrace, oops or BUG on its screen). -- mg
Re: [Qemu-devel] Unresponsive linux guest once migrated
On 02.04.2014 11:39, Dr. David Alan Gilbert wrote: * Marcin Gibu??a (m.gib...@beyond.pl) wrote: Can you give: 1) A backtrace from the guest thread apply all bt full in gdb You mean from gdb attached to hanged guest? I'll try to get it. From what I remember it looks rather "normal" - busy executing guest code. yes; if you can send it a sysrq to trigger a backtrace it might also be worth a try - I'm just trying to find what the guest is really doing when it's apparentyly 'hung'. IIRC VM doesn't respond to sysrq key sequence. It doesn't respond to anything actually but NMI. I tried to do inject-nmi. VMs kernel responded with timestamped message "Uhhuh. NMI received. Dazed and confused, but trying to continue". That timestamp never changes - its like time is frozen on VM. I'll try to find my notes from this gdb session. -- mg
Re: [Qemu-devel] Unresponsive linux guest once migrated
Yes, that's where it gets weird. I've never seen this on fresh VM. It needs to be idle for couple of hours at least. And even then it doesn't always hang. So your OS is just sitting at a text console, running nothing special? When you reboot after the migration what's the last thing you see in the guests logs? Is there anything from after the migration? Yes, it's completely idle. After reboot there is nothing in logs. I've dumped memory of one of hanged test VMs and found kernel message buffer. The last entries were: init: failsafe main process (659) killed by TERM signal init: plymouth-upstart-bridge main process (651) killed by TERM signal Clocksource tsc unstable (delta = 470666274 ns) Uhhuh. NMI received for unknown reason 30 on CPU 0. Do you have a strange power saving mode enabled?I: Dazed and confused, but trying to continue Uhhuh. NMI received for unknown reason 20 on CPU 0. Do you have a strange power saving mode enabled?I: Dazed and confused, but trying to continue <0>Dazed and confused, but trying to continue I've tried to disassemble where VM kernel (3.8.something from Ubuntu) is spinning (using qemu-monitor, registers info and symbols from guest kernel) and it was loop inside __run_timers function from kernel/timer.c: while (time_after_eq(jiffies, base->timer_jiffies)) { ... } However my disassembly and qemu debugging skills are limited, would it help if I dump memory of broken VM and send it you somehow? -- mg
[Qemu-devel] qemu 2.0.0-rc2 crash
Hi, I've been playing with QEMU 2.0-rc2 and found a crash that isn't there in 1.7.1. Virtual machine is created via libvirt and when I query it with 'dommemstat' it crashes with following backtrace: Program received signal SIGSEGV, Segmentation fault. 0x7f5883655c0a in object_class_dynamic_cast (class=0x7f588618fbb0, typename=typename@entry=0x7f58837ebe54 "object") at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:525 525 /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c: No such file or directory. (gdb) bt #0 0x7f5883655c0a in object_class_dynamic_cast (class=0x7f588618fbb0, typename=typename@entry=0x7f58837ebe54 "object") at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:525 #1 0x7f5883655da5 in object_dynamic_cast (obj=0x7f58861604c0, typename=typename@entry=0x7f58837ebe54 "object") at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:456 #2 0x7f5883657d6e in object_resolve_abs_path (parent=out>, parts=parts@entry=0x7f5886352ad0, typename=typename@entry=0x7f58837ebe54 "object", index=index@entry=1) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:1244 #3 0x7f5883657f20 in object_resolve_path_type (path=out>, typename=0x7f58837ebe54 "object", ambiguous=0x7fff1ccab257) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:1312 #4 0x7f5883652d7f in qmp_qom_list (path=0x7f588615c9a0 "//machine/i440fx/pci.0/child[9]", errp=errp@entry=0x7fff1ccab290) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qmp.c:201 #5 0x7f588364dd55 in qmp_marshal_input_qom_list (mon=out>, qdict=, ret=0x7fff1ccab310) at qmp-marshal.c:2490 #6 0x7f58836ef4e8 in qmp_call_cmd (params=0x7f58893626b0, mon=0x7f5885c9ec90, cmd=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/monitor.c:4760 #7 handle_qmp_command (parser=, tokens=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/monitor.c:4826 #8 0x7f588378289a in json_message_process_token (lexer=0x7f5885ca00a0, token=0x7f58861a0500, type=JSON_OPERATOR, x=95, y=20) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-streamer.c:87 #9 0x7f5883797c4f in json_lexer_feed_char (lexer=lexer@entry=0x7f5885ca00a0, ch=125 '}', flush=flush@entry=false) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-lexer.c:303 #10 0x7f5883797d96 in json_lexer_feed (lexer=0x7f5885ca00a0, buffer=, size=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-lexer.c:356 #11 0x7f5883782ab1 in json_message_parser_feed (parser=out>, buffer=, size=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-streamer.c:110 #12 0x7f58836ed593 in monitor_control_read (opaque=, buf=, size=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/monitor.c:4847 #13 0x7f588363d4e1 in qemu_chr_be_write (len=, buf=0x7fff1ccab4f0 "}", s=0x7f5885caf0b0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-char.c:165 #14 tcp_chr_read (chan=, cond=, opaque=0x7f5885caf0b0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-char.c:2487 #15 0x7f58814d0b75 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #16 0x7f588360b0e8 in glib_pollfds_poll () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:190 #17 os_host_main_loop_wait (timeout=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:235 #18 main_loop_wait (nonblocking=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:484 #19 0x7f58834dbb6e in main_loop () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:2051 #20 main (argc=, argv=, envp=out>) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:4507 Virtual machine options command line: LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/ USER=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name f1b3b8b7-7b0e-4eab-afef-06d577d6544d -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 4096 -realtime mlock=on -smp 4,sockets=2,cores=10,threads=1 -uuid f1b3b8b7-7b0e-4eab-afef-06d577d6544d -smbios type=0,vendor=HAL 9000 -smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f1b3b8b7-7b0e-4eab-afef-06d577d6544d.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,
Re: [Qemu-devel] qemu 2.0.0-rc2 crash
W dniu 2014-04-10 15:43, Marcel Apfelbaum pisze: On Thu, 2014-04-10 at 14:55 +0200, Marcin Gibuła wrote: Hi, I've been playing with QEMU 2.0-rc2 and found a crash that isn't there in 1.7.1. Hi Marcin, Thanks for reporting the bug! Do you have a development environment? If you do, and the reproduction is fast (and you already have a setup), a git bisect to find the problematic commit would be appreciated, Hi, yes, it's on development environment. If you could point me to some quick guide to bisecting qemu, I'll be happy to do it. -- mg
Re: [Qemu-devel] troubleshooting live migration
> I tried -no-hpet, was still able to replicate the 'lapic' issue. I > find it interesting that I can only trigger it if the vm has been > running awhile. Hi, I've seen identical crashes with live migration in our environment. It looks identical - VM has to be idle for some time and after migration CPU is at 100% and VM is dead. All migration happens between same hardware. I don't think I've ever had Windows guest crashing like this and I think this is somehow related to kvmclock. I've tried to debug qemu guest process and from I can tell, its kernel is busy looping in some time management related functions. Could you try to reproduce this issue with -no-kvmclock? Our testing environment is currently offline so I can't test it myself. We also use 3.10 kernel (though 3.8 wasn't working either) and strugled with this issue with qemu 1.4, 1.5 and 1.6. Didn't test 1.7. Also we're using AMD CPUs, so it seems to be platform independend. -- mg
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
On 06.02.2014 15:03, Stefan Priebe - Profihost AG wrote: some more things which happen during migration: php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0 error 4 in php-cgi[40+6d7000] php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20 error 4 in ZendOptimizer.so[7f1fb8e71000+147000] cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp 7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000] Hi, I've seen memory corruptions after live (and offline) migrations as well. But in our enviroment its mostly (but not only) seen as timer corruption - guest hangs or have insane date in future. But I've seen segfaults and oopses as well. Sadly it's very hard for me to reproduce it reliably but it occures on all types of linux guests - all versions of ubuntu, centos, debian, etc, so it doesn't seem to be connected to a specific guest kernel version. I've never seen windows crashing though. There was another guy here on qemu-devel who had similar issue and fixed it by running guest with no-kvmclock. I've tested qemu 1.4 - 1.6 and kernels 3.4 - 3.10. -- mg
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
do you use xbzrle for live migration ? no - i'm really stucked right now with this. Biggest problem i can't reproduce with test machines ;-( Only being able to test on your production VMs isn't fun; is it possible or you to run an extra program on these VMs - e.g. if we came up with a simple (userland) memory test? You mean to reproduce? I already tried https://code.google.com/p/stressapptest/ while migrating on a test VM but this works fine. I also tried running mysql bench while migrating on a test vm and this works too ;-( Have you tried to let test VM run idle for some time before migrating? (like 18-24 hours) Having the same (or very similar) problem, I had bigger luck with reproducing it by not using freshly started VMs. -- mg
Re: [Qemu-devel] migration question: disk images on nfs server
For NFS you need to use the sync mount option to force the NFS client to sync to server on writes. Isn't opening with O_DIRECT enough? (for linux nfs client at least) -- mg
Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry
You mean to reproduce? I'm more interested in seeing what type of corruption is happening; if you've got a test VM that corrupts memory and we can run a program in that vm that writes a known pattern into memory and checks it then see what changed after migration, it might give a clue. But obviously this would only be of any use if run on the VM that actually fails. Hi, Seeing similar issue in my company I would be happy to run such tests. Do you have any test suite I could run or some leads how to write it? -- mg
Re: [Qemu-devel] migration question: disk images on nfs server
It is more a NFS issue, if you have a file in NFS that two users in two different host are accessing (one at least write to it) you will need to enforce the "sync" option. Even if you flush all the data and close the file the NFS client can still have cached data that it didn't sync to the server. Do you know if is applies to linux O_DIRECT writes as well? From comment in fs/nfs/direct.c: * When an application requests uncached I/O, all read and write requests * are made directly to the server; data stored or fetched via these * requests is not cached in the Linux page cache. The client does not * correct unaligned requests from applications. All requested bytes are * held on permanent storage before a direct write system call returns to * an application. -- mg
Re: [Qemu-devel] migration question: disk images on nfs server
On 07.02.2014 14:36, Orit Wasserman wrote: Do you know if is applies to linux O_DIRECT writes as well? From the man of open: The behaviour of O_DIRECT with NFS will differ from local filesystems. Older kernels, or kernels configured in certain ways, may not support this combination. The NFS protocol does not support passing the flag to the server, so O_DIRECT I/O will bypass the page cache only on the client; the server may still cache the I/O. The client asks the server to make the I/O synchronous to preserve the synchronous semantics of O_DIRECT. Some servers will perform poorly under these circumstances, especially if the I/O size is small. Some servers may also be configured to lie to clients about the I/O having reached stable storage; this will avoid the performance penalty at some risk to data integrity in the event of server power failure. The Linux NFS client places no alignment restrictions on O_DIRECT I/O. To summaries it depends on your kernel (NFS client). So, assuming new kernel (where nfs O_DIRECT translates to no cache at client side) and cache coherent server, is it enough or is 'sync' mount (or O_SYNC flag) still required for some reason? -- mg
Re: [Qemu-devel] Unresponsive linux guest once migrated
W dniu 2014-04-15 20:53, Dr. David Alan Gilbert pisze: * Marcus (shadow...@gmail.com) wrote: I can answer some of the questions. It's been 3 months or so since I looked into it. I ended up disabling kvmclock from the qemu command line and moving on. I saw it with CentOS 6.5 and Ubuntu 12.04 guests. Sending the guest to the BIOS CLI or PXE would not reproduce the issue. I didn't attempt an array of qemu versions, but I can say that it did occur on 1.7.0 and 1.6.1, with the host running kernel 3.10 or 3.12. The CPUs are Intel E5-2650. If you could test it with the latest 2.0.x-rc that would be interesting to know, since you have a setup where it fails for you. Hi, I'll soon be able to test it with the newest version. And if it fails - what next steps should I take to help debug it? VM is usually pretty much dead and unresponsive to anything but NMI. -- mg
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
W dniu 2014-05-05 15:51, Alexander Graf pisze: When we migrate we ask the kernel about its current belief on what the guest time would be. However, I've seen cases where the kvmclock guest structure indicates a time more recent than the kvm returned time. Hi, is it possible to have kvmclock jumping forward? Because I've reproducible case when at about 1 per 20 vm restores, VM freezes for couple of hours and then resumes with date few hundreds years ahead. Happens only with kvmclock. And this patch seems to fix very similar issue so maybe it's all the same bug. -- mg
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
is it possible to have kvmclock jumping forward? Because I've reproducible case when at about 1 per 20 vm restores, VM freezes for couple of hours and then resumes with date few hundreds years ahead. Happens only with kvmclock. And this patch seems to fix very similar issue so maybe it's all the same bug. I'm fairly sure it is the exact same bug. Jumping backward is like jumping forward by a big amount :) Hi, I've tested your path on my test VM... don't know if it's pure luck or not, but it didn't hang with over 70 restores. The message "KVM Clock migrated backwards, using later time" fires every time, but VM is healthy after resume. -- mg
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
What is the host clocksource? (cat /sys/devices/system/clocksource/clocksource0/current_clocksource). tsc And kernel version? 3.12.17 But I've seen this problem on earlier versions as well (3.8, 3.10). -- mg
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
Yes, and it isn't. Any ideas why it's not? This patch really just uses the guest visible kvmclock time rather than the host view of it on migration. There is definitely something very broken on the host's side since it does return a smaller time than the guest exposed interface indicates. Don't know if helps but here are example values from time_at_migration and s->clock from your patch. Tested on 5 restores of saved VM that (used to) hang: s->clock time_at_migration 157082235125698 157113284546655 157082235125698 157113298196976 157082235125698 157113284615117 157082235125698 157113284486601 157082235125698 157113284479740 Now, when I compare system time on guest with and without patch: On unpatched qemu vm restores with date: Apr 18 06:56:36 On patched qemu it says: Apr 18 06:57:06 -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
Two options for making progress on this bug: 1. Debug bdrv_drain_all() and find out whether there are any I/O requests remaining. Yes, there is one request pending on active layer of disk that is being commited (on bs->tracked_requests list). IO threads die off because they have nothing to do... it seems that requests are somehow not commited into threads. I tried hard (and will continue to try) to debug this, but documentation is limited :-) so ANY tips where to look are welcome. 2. Post steps for reproducing this problem (exact command-lines or virsh commands used). I'm using application that talks with libvirt via API, so I describe what it does. 1. Create a VM, boot a system. I'm using iso from http://www.sysresccd.org 2. VM has a mounted QCOW2 disk with following hierarchy: [file1] -> [file2 (active)] Both are qcow2 files. 3. Open console. Start command: while true; do dd if=/dev/zero of=/dev/vdX bs=512k oflag=direct; done; Where vdX is of course qcow2 disk described above. 4. Create snapshot of file2 (virDomainSnapshotCreateXML). So now we have: [file1] -> [file2] -> [file3 (active)] - 5. Wait couple of seconds (so snapshot fills up). 6. Commit file2 into file1 (virDomainBlockCommit). 7. During commit, another threads is using virDomainGetBlockJobInfo() to query its progress. Note - it doesn't always happen. I have about 1 per 10 failure rate with this procedure. Do you want me to reproduce it manually with pure virsh? -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
What happens if you omit #7 virDomainGetBlockJobInfo()? Does it still hang 1/10 times? Yes, it still hangs. Can you post the QEMU command-line so we know the precise VM configuration? (ps aux | grep qemu) /usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536 -realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/b6ed3ffc-ddca-4f10-839b-81a5b1ce371f.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk5,id=virtio-disk5,bootindex=2 -drive file=/root/rescue.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:2,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox on -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
/usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536 -realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/b6ed3ffc-ddca-4f10-839b-81a5b1ce371f.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk5,id=virtio-disk5,bootindex=2 -drive file=/root/rescue.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:2,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox on Please try disabling I/O limits on the drive and try again. Still hangs. -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
Please try disabling I/O limits on the drive and try again. Is there anything else I could try? I've captured trace of hanged VM with following events traced: bdrv_* paio_* thread_pool_* commit_* qcow2_* and debug code that prints requests from traced_requests in bdrv_requests_pending function. It's available here: http://filebin.net/tmscfay2pa/hanged-trace.gz (50mb after decompression) -- mg
Re: [Qemu-devel] qemu 2.0, deadlock in block-commit
1. Debug bdrv_drain_all() and find out whether there are any I/O requests remaining. I believe that's what happens: Context 1: - commit_one_iteration makes write request (req A) - request A is handled to io thread, qemu_coroutine_yield() is called Context 2: - VM makes write request (req B) - request B is inserted into bs->tracked_requests - request B is handled to io thread, qemu_coroutine_yield() is called - request A is completed, bdrv_co_io_em notification is called and jumps into context 1 - meanwhile request B is completed. Main thread is currently executing context 1 Context 1: - calls bdrv_drain_all - calls bdrv_requests_pending_all. It returns true as bs->tracked_request is not empty (it still has req B) - calls aio_pool which hangs, as req B has been already completed but it notification has not been called yet. (this part I'm not sure. But it hangs forever for some reason...) This is based from traces and debug prints I collected. I've made patch that moves bdrv_drop_intermediate() into separate bottom half and couldn't recreate hang after this. But it probably affects mirror_run as well so I don't know if this is acceptable solution for you. -- mg
[Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other
When two coroutines submit I/O and first coroutine depends on second to complete (by calling bdrv_drain_all), deadlock may occur. This is because both requests may have completed before thread pool notifier got called. Then, when notifier gets executed and first coroutine calls aio_pool() to make progress, it will hang forever, as notifier's descriptor has been already marked clear. This patch fixes this, by rearming thread pool notifier if there are more than one completed requests on list. Without this patch, I could reproduce this bug with snapshot-commit with about 1 per 10 tries. With this patch, I couldn't reproduce it any more. Signed-off-by: Marcin Gibula --- --- thread-pool.c 2014-04-17 15:44:45.0 +0200 +++ thread-pool.c 2014-05-31 20:20:26.083011514 +0200 @@ -76,6 +76,8 @@ struct ThreadPool { int new_threads; /* backlog of threads we need to create */ int pending_threads; /* threads created but not running yet */ int pending_cancellations; /* whether we need a cond_broadcast */ +int pending_completions; /* whether we need to rearm notifier when +executing callback */ bool stopping; }; @@ -110,6 +112,10 @@ static void *worker_thread(void *opaque) ret = req->func(req->arg); req->ret = ret; +if (req->common.cb) { +pool->pending_completions++; +} + /* Write ret before state. */ smp_wmb(); req->state = THREAD_DONE; @@ -185,6 +191,14 @@ restart: } if (elem->state == THREAD_DONE && elem->common.cb) { QLIST_REMOVE(elem, all); +/* If more completed requests are waiting, notifier needs + * to be rearmed so callback can progress with aio_pool(). + */ +pool->pending_completions--; +if (pool->pending_completions) { +event_notifier_set(notifier); +} + /* Read state before ret. */ smp_rmb(); elem->common.cb(elem->common.opaque, elem->ret);
Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other
Good catch! The main problem with the patch is that you need to use atomic_inc/atomic_dec to increment and decrement pool->pending_completions. Ok. Secondarily, event_notifier_set is pretty heavy-weight, does it work if you wrap the loop like this? restart: QLIST_FOREACH_SAFE(elem, &pool->head, all, next) { ... } if (pool->pending_completions) { goto restart; } event_notifier_test_and_clear(notifier); if (pool->pending_completions) { event_notifier_set(notifier); goto restart; } I'll test it tomorrow. I assume you want to avoid calling event_notifier_set() until function is reentered via aio_pool? > Finally, the same bug is also in block/linux-aio.c and > block/win32-aio.c. I can try with linux-aio, but my knowledge of windows api is zero... -- mg
[Qemu-devel] [PATCH v2] thread-pool: fix deadlock when callbacks depends on each other
When two coroutines submit I/O and first coroutine depends on second to complete (by calling bdrv_drain_all), deadlock may occur. This is because both requests may have completed before thread pool notifier got called. Then, when notifier gets executed and first coroutine calls aio_pool() to make progress, it will hang forever, as notifier's descriptor has been already marked clear. This patch fixes this, by deferring clearing notifier until no completions are pending. Without this patch, I could reproduce this bug with snapshot-commit with about 1 per 10 tries. With this patch, I couldn't reproduce it any more. Signed-off-by: Marcin Gibula --- --- thread-pool.c 2014-04-17 15:44:45.0 +0200 +++ thread-pool.c 2014-06-02 09:10:25.442260590 +0200 @@ -76,6 +76,8 @@ struct ThreadPool { int new_threads; /* backlog of threads we need to create */ int pending_threads; /* threads created but not running yet */ int pending_cancellations; /* whether we need a cond_broadcast */ +int pending_completions; /* whether we need to rearm notifier when +executing callback */ bool stopping; }; @@ -110,6 +112,10 @@ static void *worker_thread(void *opaque) ret = req->func(req->arg); req->ret = ret; +if (req->common.cb) { +atomic_inc(&pool->pending_completions); +} + /* Write ret before state. */ smp_wmb(); req->state = THREAD_DONE; @@ -173,7 +179,6 @@ static void event_notifier_ready(EventNo ThreadPool *pool = container_of(notifier, ThreadPool, notifier); ThreadPoolElement *elem, *next; -event_notifier_test_and_clear(notifier); restart: QLIST_FOREACH_SAFE(elem, &pool->head, all, next) { if (elem->state != THREAD_CANCELED && elem->state != THREAD_DONE) { @@ -185,6 +190,8 @@ restart: } if (elem->state == THREAD_DONE && elem->common.cb) { QLIST_REMOVE(elem, all); +atomic_dec(&pool->pending_completions); + /* Read state before ret. */ smp_rmb(); elem->common.cb(elem->common.opaque, elem->ret); @@ -196,6 +203,19 @@ restart: qemu_aio_release(elem); } } + +/* Double test of pending_completions is necessary to + * ensure that there is no race between testing it and + * clearing notifier. + */ +if (atomic_read(&pool->pending_completions)) { +goto restart; +} +event_notifier_test_and_clear(notifier); +if (atomic_read(&pool->pending_completions)) { +event_notifier_set(notifier); +goto restart; +} } static void thread_pool_cancel(BlockDriverAIOCB *acb)
Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other
I'll test it tomorrow. I assume you want to avoid calling event_notifier_set() until function is reentered via aio_pool? Yes. But actually, I need to check if it's possible to fix bdrv_drain_all. If you're in coroutine context, you can defer the draining to a safe point using a bottom half. If you're not in coroutine context, perhaps bdrv_drain_all has to be made illegal. Which means a bunch of code auditing... For what it's worth, your solution also works fine, I couldn't recreate hang with it. Updated patch proposal posted earlier today. -- mg
Re: [Qemu-devel] [PATCH v2] kvmclock: Ensure time in migration never goes backward
+cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time)); + +delta = migration_tsc - time.tsc_timestamp; Hi, when I was testing live storage migration with libvirt I found out that this patch can cause virtual machine to hang when completing mirror job. This is (probably) because kvmclock_current_nsec() is called twice in a row and on second call time.tsc_timestamp is larger than migration_tsc. This causes delta to be huge and sets timer to invalid value. The double call happens when switching from old to new disk (pivoting in libvirt's nomenclature). Example values: First call: migration_tsc: 12052203518652476, time_tsc: 12052203301565676, delta 108543400 Second call: migration_tsc: 12052203518652476, time_tsc: 12052204478600322, delta 9223372036374801885 Perhaps it is worth adding: if (time.tsc_timestamp > migration_tsc) { return 0; } there? Untested though... -- mg
Re: [Qemu-devel] [PATCH v2] kvmclock: Ensure time in migration never goes backward
Can you give this patch a try? Should read the guest TSC values after stopping the VM. Yes, this patch fixes that. Thanks, -- mg
Re: [Qemu-devel] [PATCH v2] thread-pool: fix deadlock when callbacks depends on each other
On 04.06.2014 12:01, Stefan Hajnoczi wrote: On Mon, Jun 02, 2014 at 09:15:27AM +0200, Marcin Gibuła wrote: When two coroutines submit I/O and first coroutine depends on second to complete (by calling bdrv_drain_all), deadlock may occur. bdrv_drain_all() is a very heavy-weight operation. Coroutines should avoid it if possible. Please post the file/line/function where this call was made, perhaps there is a better way to wait for the other coroutine. This isn't a fix for this bug but it's a cleanup. As in original bug report: #4 0x7f699c095c0a in bdrv_drain_all () at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805 #5 0x7f699c09c87e in bdrv_close (bs=bs@entry=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695 #6 0x7f699c09c5fa in bdrv_delete (bs=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978 #7 bdrv_unref (bs=0x7f699f0bc520) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198 #8 0x7f699c09c812 in bdrv_drop_intermediate (active=active@entry=0x7f699ebfd330, top=top@entry=0x7f699f0bc520, base=base@entry=0x7f699eec43d0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567 #9 0x7f699c0a1963 in commit_run (opaque=0x7f699f17dcc0) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144 #10 0x7f699c0e0dca in coroutine_trampoline (i0=, i1=) at /var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118 mirror_run probably has this as well. I didn't check others. -- mg
Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation
W dniu 2015-08-14 o 03:23, Li, Liang Z pisze: On Thu, Aug 13, 2015 at 01:25:29AM +, Li, Liang Z wrote: Hi Paolo & Marcelo, Could please point out what issue the patch 317b0a6d8ba44e try to fix? I found in live migration the cpu_synchronize_all_states will be called twice, and it will take more than 1 ms sometimes. I try to do some optimization but lack the knowledge about the background. What the code in 317b0a6d8ba44e requires is to retrieve the TSC value from the kernel. I know 317b0a6d8ba44e is to retrieve the TSC value, but I don't understand why it is needed. During the live migration, the cpu_synchronize_all_states will be called later after stopping kvm-clock. The env->tsc will be updated, is that not enough? Or is there some case like call the 'stop_vm(RUN_STATE_PAUSED )' or ' 'stop_vm (RUN_STATE_DEBUG) ', that require updating the env->tsc? By google, I find that your patch try to fix some issue, but I don't know what the exact issue. I remember testing these, and I afair that was the reason: http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html -- mg
Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation
Thanks for your reply, I have read the thread in your email, what's the mean of 'switching from old to new disk', could give a detail description? The test case was like that (using libvirt): 1. Get VM running (linux, using kvmclock), 2. Use blockcopy to copy disk data from one location to another, 3. Issue blockjob --pivot (to finish mirroring) From what I remember, at point 3, VM is momentarily paused and resumed, so kvm state change handler is called twice. Without this patch, the VM hanged because its time goes backwards (or qemu crashed if assertion was not compiled out). -- mg
Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation
So, the problem is cause by stop_vm(RUN_STATE_PAUSED), in this case the env->tsc is not updated, which lead to the issue. Is that right? I think so. If the cpu_clean_all_dirty() is needed just for the APIC status reason, I think we can do the cpu_synchronize_all_states() in do_vm_stop and after vm_state_notify() when the RUN_STATE_PAUSED is hit, at this point all the device models is stopped, there is no outdated APIC status. Yes, cpu_clean_all_dirty() was needed because without it, the second call to cpu_synchronize_all_states() (which is done inside qemu_savevm_state_complete() and after kvmclock) does nothing. I want to write a patch to fix this issue in another way, could help to verify it in you environment, very appreciate if you could. Sure, I'll test it. Both issues were quite easy to reproduce. -- mg
Re: [Qemu-devel] [RFC 0/2] Reduce the VM downtime about 300us
W dniu 2015-08-25 o 07:52, Liang Li pisze: This patch is for kvm live migration optimization, it fixes the issue which commit 317b0a6d8ba tries to fix in another way, and it can reduce the live migration VM downtime about 300us. *This patch is not tested for the issue commit 317b0a6d8ba tries to fix* I'll try to test it within next few days. -- mg