from:"Marcin Gibuła"

[Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła


Hi,

I've encountered deadlock in qemu during some stress testing. The test 
is making snapshots, committing them and constantly quering for block 
job info.


The version of QEMU is 2.0.0 rc3 (backtrace below says rc2, but it's 
manualy patched to rc3), but there seems to be no changes in block layer 
in final 2.0 (?).


This is backtrace of qemu process:

(gdb) thread apply all backtrace

Thread 22 (Thread 0x7f6994852700 (LWP 13651)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699ab4c4eb in ?? () from /usr/lib64/librados.so.2
#2  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f698700 (LWP 13652)):
#0  0x7f69982f5ff1 in sem_timedwait () from /lib64/libpthread.so.0
#1  0x7f699ac3e1b8 in ?? () from /usr/lib64/librados.so.2
#2  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f698f7fe700 (LWP 13653)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699ab7b383 in ?? () from /usr/lib64/librados.so.2
#2  0x7f699abe625d in ?? () from /usr/lib64/librados.so.2
#3  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f698effd700 (LWP 13654)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699abe1c88 in ?? () from /usr/lib64/librados.so.2
#2  0x7f699abe6a6d in ?? () from /usr/lib64/librados.so.2
#3  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f698e7fc700 (LWP 13655)):
#0  0x7f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699aaeced8 in ?? () from /usr/lib64/librados.so.2
#2  0x7f699aaede0d in ?? () from /usr/lib64/librados.so.2
#3  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f698dffb700 (LWP 13656)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699aaee862 in ?? () from /usr/lib64/librados.so.2
#2  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f698d7fa700 (LWP 13657)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699abd288e in ?? () from /usr/lib64/librados.so.2
#2  0x7f699abddf1d in ?? () from /usr/lib64/librados.so.2
#3  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f698d6f9700 (LWP 13658)):
#0  0x7f699802007d in poll () from /lib64/libc.so.6
#1  0x7f699abc56ac in ?? () from /usr/lib64/librados.so.2
#2  0x7f699abc7460 in ?? () from /usr/lib64/librados.so.2
#3  0x7f699abd9c2c in ?? () from /usr/lib64/librados.so.2
#4  0x7f699abde03d in ?? () from /usr/lib64/librados.so.2
#5  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#6  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f698d5f8700 (LWP 13659)):
#0  0x7f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699aaeced8 in ?? () from /usr/lib64/librados.so.2
#2  0x7f699aaede0d in ?? () from /usr/lib64/librados.so.2
#3  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x7f6998029dad in clone () from /lib64/libc.so.6
---Type  to continue, or q  to quit---

Thread 13 (Thread 0x7f698cdf7700 (LWP 13660)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699aaee862 in ?? () from /usr/lib64/librados.so.2
#2  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f697700 (LWP 13661)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699aaee862 in ?? () from /usr/lib64/librados.so.2
#2  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#3  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f697f7fe700 (LWP 13662)):
#0  0x7f69982f40de in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0

#1  0x7f699a5bc666 in ?? () from /usr/lib64/librbd.so.1
#2  0x7f699a5cf76d in ?? () from /usr/lib64/librbd.so.1
#3  0x7f69982eff3a in start_thread () from /lib64/libpthread.so.0
#4  0x7f6998029dad in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f698c5f6700 (LWP 13663)):
#0  0x7f69982f3d0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpt

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła


W dniu 2014-05-22 22:49, Marcin Gibuła pisze:


Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)):
#0  0x7f6998020286 in ppoll () from /lib64/libc.so.6
#1  0x7f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0,
__nfds=, __fds=) at
/usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=,
timeout=) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311
#3  0x7f699c0877e0 in aio_poll (ctx=0x7f699e4c9c00,
blocking=blocking@entry=true) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221
#4  0x7f699c095c0a in bdrv_drain_all () at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805


Some more info.
VM was doing lot of write IO during this test.

ppoll() is listening for these descriptors (from strace):

ppoll([{fd=25, events=POLLIN|POLLERR|POLLHUP}, {fd=23, 
events=POLLIN|POLLERR|POLLHUP}, {fd=17, events=POLLIN|POLLERR|POLLHUP}, 
{fd=4, events=POLLIN|POLLERR|POLLHUP}], 4, NULL, NULL, 8, ...)


fd # ls -l 25 23 17 4
lrwx-- 1 usr_5062 qemu 64 May 22 23:00 17 -> anon_inode:[eventfd]
lrwx-- 1 usr_5062 qemu 64 May 22 23:00 23 -> anon_inode:[eventfd]
lrwx-- 1 usr_5062 qemu 64 May 22 23:00 25 -> anon_inode:[eventfd]
lrwx-- 1 usr_5062 qemu 64 May 22 23:00 4 -> anon_inode:[eventfd]

VM is started via libvirt. No errors are reported in logs.
Command line is:

/usr/bin/qemu-system-x86_64 -machine accel=kvm -name 
68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine 
pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536 
-realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 
68189c3c-02f6-4aae-88a2-5f13c5e6f53a -smbios type=0,vendor=HAL 9000 
-smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard 
-no-shutdown -boot menu=off,strict=on -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/dev/cube2/5f751718-ff36-420f-b034-5f31230b5f23,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive 
file=/dev/cube1/c5b7a6e3-11f8-4b08-ac3e-5ea054028221,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 
-drive 
file=/mnt/nfs/volumes/66346c1b-add5-4412-89d9-b00a3bb13e75/72be1b50-982e-458a-9a84-c0fbd48b4b3c.qcow2,if=none,id=drive-virtio-disk2,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2 
-drive 
file=/mnt/nfs/volumes/a20c3b29-6f21-4b3d-a3fb-8b80599e50df/b84716ea-2564-47cc-bbbf-dea6029132b4.qcow2,if=none,id=drive-virtio-disk3,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk3,id=virtio-disk3 
-drive 
file=/mnt/nfs/volumes/0c2996b5-abec-47ea-9e88-ebd7ebf0c79d/453cb20a-1705-45e2-9f9e-bc1ea096d52a.qcow2,if=none,id=drive-virtio-disk4,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk4,id=virtio-disk4 
-drive 
file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/a8bb7e11-a9b5-4613-9b63-b9722fba2166.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk5,id=virtio-disk5 
-drive 
file=rbd:iso-images/rescue.iso:auth_supported=none,if=none,id=drive-ide0-0-0,readonly=on,format=raw 
-device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
-netdev tap,fd=19,id=hostnet0,vhost=on,vhostfd=20 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3 
-netdev tap,fd=21,id=hostnet1,vhost=on,vhostfd=22 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:70:10:35,bus=pci.0,addr=0x4 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.agent,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
-chardev 
socket,id=charchannel1,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.cloud.agent,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=cha

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła


I've encountered deadlock in qemu during some stress testing. The test
is making snapshots, committing them and constantly quering for block
job info.


What is the exact command you used for triggering the block-commit?  Was
it via direct HMP or QMP, or indirect via libvirt?


Via libvirt.


Were you trying to
commit the active layer?


No. Commit was to intermediate file. I'm aware that libvirt does not 
support active layer commit yet.


Plus, judging from backtrace, hang seems to be deep inside qemu. The VM 
is unresponsive after this.


--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła


On 23.05.2014 10:19, Paolo Bonzini wrote:

Il 22/05/2014 23:05, Marcin Gibuła ha scritto:

Some more info.
VM was doing lot of write IO during this test.


QEMU is waiting for librados to complete I/O.  Can you reproduce it with
a different driver?



I'll try.

However RBD is used only as read-only ISO - 
rbd:iso-images/rescue.iso:auth_supported=none,if=none,id=drive-ide0-0-0,readonly=on,format=raw) 
- what IO would it have to complete?



--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła


On 23.05.2014 10:19, Paolo Bonzini wrote:

Il 22/05/2014 23:05, Marcin Gibuła ha scritto:

Some more info.
VM was doing lot of write IO during this test.


QEMU is waiting for librados to complete I/O.  Can you reproduce it with
a different driver?


Hi,

I've reproduced it without RBD. Backtrace below:

(gdb) thread apply all backtrace

Thread 4 (Thread 0x7f9c8cccd700 (LWP 2017)):
#0  0x7f9c907717a4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9c9076d19c in _L_lock_518 () from /lib64/libpthread.so.0
#2  0x7f9c9076cfeb in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x7f9c947addf9 in qemu_mutex_lock 
(mutex=mutex@entry=0x7f9c95002660 ) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:76
#4  0x7f9c946b3a10 in qemu_mutex_lock_iothread () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:1043
#5  0x7f9c9470cf3d in kvm_cpu_exec (cpu=cpu@entry=0x7f9c968bf290) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1683
#6  0x7f9c946b271c in qemu_kvm_cpu_thread_fn (arg=0x7f9c968bf290) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873

#7  0x7f9c9076af3a in start_thread () from /lib64/libpthread.so.0
#8  0x7f9c904a4dad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f9c87fff700 (LWP 2018)):
#0  0x7f9c9049c897 in ioctl () from /lib64/libc.so.6
#1  0x7f9c9470cdf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f9c968fa300, 
type=type@entry=44672) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796
#2  0x7f9c9470cf35 in kvm_cpu_exec (cpu=cpu@entry=0x7f9c968fa300) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681
#3  0x7f9c946b271c in qemu_kvm_cpu_thread_fn (arg=0x7f9c968fa300) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873

#4  0x7f9c9076af3a in start_thread () from /lib64/libpthread.so.0
#5  0x7f9c904a4dad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f9c869ff700 (LWP 2020)):
#0  0x7f9c9076ed0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f9c947ae019 in qemu_cond_wait 
(cond=cond@entry=0x7f9c9695a250, mutex=mutex@entry=0x7f9c9695a280)
at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:135
#2  0x7f9c946a270b in vnc_worker_thread_loop 
(queue=queue@entry=0x7f9c9695a250) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:222
#3  0x7f9c946a2ae0 in vnc_worker_thread (arg=0x7f9c9695a250) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:323

#4  0x7f9c9076af3a in start_thread () from /lib64/libpthread.so.0
#5  0x7f9c904a4dad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f9c94448900 (LWP 2013)):
#0  0x7f9c9049b286 in ppoll () from /lib64/libc.so.6
#1  0x7f9c9466ed9b in ppoll (__ss=0x0, __timeout=0x0, 
__nfds=, __fds=) at 
/usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, 
timeout=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:311
#3  0x7f9c945027e0 in aio_poll (ctx=0x7f9c95d5bc00, 
blocking=blocking@entry=true) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/aio-posix.c:221
#4  0x7f9c94510c0a in bdrv_drain_all () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805
#5  0x7f9c9451787e in bdrv_close (bs=bs@entry=0x7f9c969b7d90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695
#6  0x7f9c945175fa in bdrv_delete (bs=0x7f9c969b7d90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978
#7  bdrv_unref (bs=0x7f9c969b7d90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198
#8  0x7f9c94517812 in bdrv_drop_intermediate 
(active=active@entry=0x7f9c9648f490, top=top@entry=0x7f9c969b7d90, 
base=base@entry=0x7f9c96756500)
at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567
#9  0x7f9c9451c963 in commit_run (opaque=0x7f9c96a1e280) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144
#10 0x7f9c9455bdca in coroutine_trampoline (i0=, 
i1=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118

#11 0x7f9c904009f0 in ?? () from /lib64/libc.so.6
#12 0x7fffe4bcfee0 in ?? ()
#13 0x in ?? ()

I still have this process running (hanging ;) ) if you need any more 
info. I also have no problems with reproducing it.


--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła


I see that you have a mix of aio=native and aio=threads. I can't say
much about the aio=native disks (perhaps try to reproduce without
them?), but there are definitely no worker threads for the other disks
that bdrv_drain_all() would have to wait for.


True. But I/O was being done only qcow2 disk with threads backend. And 
snapshot was made on this disk.


I'll try to reproduce with all 'threads'.


bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
function that determines for each of the disks in your VM if it still
has requests in flight that need to be completed. This function must
have returned true even though there is nothing to wait for.

Can you check which of its conditions led to this behaviour, and for
which disk it did? Either by setting a breakpoint there and
singlestepping through the function the next time it is called (if the
poll even has a timeout), or by inspecting the conditions manually in
gdb.


I'm on it.

--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła


bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
function that determines for each of the disks in your VM if it still
has requests in flight that need to be completed. This function must
have returned true even though there is nothing to wait for.

Can you check which of its conditions led to this behaviour, and for
which disk it did? Either by setting a breakpoint there and
singlestepping through the function the next time it is called (if the
poll even has a timeout), or by inspecting the conditions manually in
gdb.


The condition that is true is:

if (!QLIST_EMPTY(&bs->tracked_requests))

and it's returned for intermediate qcow2 which is being commited.

--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła


The condition that is true is:

if (!QLIST_EMPTY(&bs->tracked_requests))

and it's returned for intermediate qcow2 which is being commited.


Btw - it's also disk that is being pounded with writes during commit.

--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła


If you see a pending request on a RADOS block device (rbd) then it would
be good to dig deeper into QEMU's block/rbd.c driver to see why it's not
completing that request.

Are you using qcow2 on top of rbd?


Hi,
I've already recreated this without rbd and with stock qemu 2.0.

--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła


W dniu 2014-05-23 15:14, Marcin Gibuła pisze:

bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
function that determines for each of the disks in your VM if it still
has requests in flight that need to be completed. This function must
have returned true even though there is nothing to wait for.

Can you check which of its conditions led to this behaviour, and for
which disk it did? Either by setting a breakpoint there and
singlestepping through the function the next time it is called (if the
poll even has a timeout), or by inspecting the conditions manually in
gdb.


The condition that is true is:

if (!QLIST_EMPTY(&bs->tracked_requests))

and it's returned for intermediate qcow2 which is being commited.


My mistake, this condition is true not for intermediate file, but for an 
active one. Sorry for confusion.


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-10-10 Thread Marcin Gibuła


Does anybody know why the APIC state loaded by the first call to
kvm_arch_get_registers() is wrong, in the first place? What exactly is
different in the APIC state in the second kvm_arch_get_registers() call,
and when/why does it change?

If cpu_synchronize_state() does the wrong thing if it is called at the
wrong moment, then we may have other hidden bugs, because the user can
trigger cpu_synchronize_all_states() calls arbitrarily using monitor
commands.


My guess is, it's not wrong, it's just outdated when second call 
occures. Maybe it's an ordering issue - could kvmclock state change 
handler be called before other activity is suspended (?)


I didn't pursue it further, cause I don't know too much (anything 
really) about QEMU/APIC internals and how to track its changes.


--
mg

Re: [Qemu-devel] [PATCH uq/master] kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec calculation

2014-07-15 Thread Marcin Gibuła


@@ -65,6 +66,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)

 cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));

+assert(time.tsc_timestamp <= migration_tsc);
 delta = migration_tsc - time.tsc_timestamp;
 if (time.tsc_shift < 0) {
 delta >>= -time.tsc_shift;
@@ -123,6 +125,8 @@ static void kvmclock_vm_state_change(void *opaque,
int running,
 if (s->clock_valid) {
 return;
 }
+
+cpu_synchronize_all_states();
 ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
 if (ret < 0) {
 fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
strerror(ret));




This causes a hang during migration, so I'll revert the patch from 2.1.


For me this patch series fixed all hangs I had with migration (at least 
with qemu 2.0).


--
mg

Re: [Qemu-devel] [PATCH v2 0/2] thread-pool: avoid fd usage and fix nested aio_poll() deadlock

2014-07-15 Thread Marcin Gibuła


W dniu 2014-07-15 17:17, Paolo Bonzini pisze:

Il 15/07/2014 16:44, Stefan Hajnoczi ha scritto:

v2:
 * Leave BH scheduled so that the code can be simplified [Paolo]

These patches convert thread-pool.c from EventNotifier to QEMUBH.
They then
solve the deadlock when nested aio_poll() calls are made.

Please speak out whether you want this in QEMU 2.1 or not.  I'm not
aware of
the nested aio_poll() deadlock ever having been reported, so maybe we
can defer
to QEMU 2.2.


It was reported as a hang in block_commit.  Marcin, can you please test
these patches?


I try to test it tomorrow. The same hang was also in linux-aio however 
(I was able to reproduce it).


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcin Gibuła


Andrey,

Can you please provide instructions on how to create reproducible
environment?

The following patch is equivalent to the original patch, for
the purposes of fixing the kvmclock problem.

Perhaps it becomes easier to spot the reason for the hang you are
experiencing.


Marcelo,

the original reason for patch adding cpu_synchronize_all_states() there 
was because this bug affected non-migration operations as well - 
http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html.


Won't moving it only to migration code break these things again?



diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 272a88a..feb5fc5 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -17,7 +17,6 @@
  #include "qemu/host-utils.h"
  #include "sysemu/sysemu.h"
  #include "sysemu/kvm.h"
-#include "sysemu/cpus.h"
  #include "hw/sysbus.h"
  #include "hw/kvm/clock.h"

@@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)

  cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));

-assert(time.tsc_timestamp <= migration_tsc);
  delta = migration_tsc - time.tsc_timestamp;
  if (time.tsc_shift < 0) {
  delta >>= -time.tsc_shift;
@@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, int 
running,
  if (s->clock_valid) {
  return;
  }
-
-cpu_synchronize_all_states();
  ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
  if (ret < 0) {
  fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
diff --git a/migration.c b/migration.c
index 8d675b3..34f2325 100644
--- a/migration.c
+++ b/migration.c
@@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
  old_vm_running = runstate_is_running();

+cpu_synchronize_all_states();
  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
  if (ret >= 0) {
  qemu_file_set_rate_limit(s->file, INT64_MAX);




--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-16 Thread Marcin Gibuła


Tested on iscsi pool, though there are no-cache requirement and rbd
with disabled cache may survive one migration but iscsi backend hangs
always. As it was before, just rolling back problematic commit fixes
the problem and adding cpu_synchronize_all_states to migration.c has
no difference at a glance in a VM` behavior. The problem consist at
least two separate ones: the current hang and behavior with the
unreverted patch from agraf - last one causes live migration with
writeback cache to fail, cache=none works well in any variant which
survives first condition. Marcin, would you mind to check the current
state of the problem on your environments in a spare time? It is
probably easier to reproduce on iscsi because of way smaller time
needed to set it up, command line and libvirt config attached
(v2.1.0-rc2 plus iscsi-1.11.0).


Ok, but what exacly do you want me to test?

Just to avoid any confusion, originally there were two problems with 
kvmclock:


1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when 
clock drift (?) caused kvmclock in guest to report time in past which 
caused guest kernel to hang. This is hard to reproduce reliably 
(probably as it requires long time for drift to accumulate).


2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression 
caused by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during 
non-migration operations (drive-mirror + pivot), which also caused guest 
kernel to hang. This is trival to reproduce.


I'm using both of them applied on top of 2.0 in production and have no 
problems with them. I'm using NFS exclusively with cache=none.


So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no 
extra patches applied or reverted, on VM that is running fio, am I correct?


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła


I'm using both of them applied on top of 2.0 in production and have no
problems with them. I'm using NFS exclusively with cache=none.

So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no
extra patches applied or reverted, on VM that is running fio, am I correct?



Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
prepared image, and I have one hundred percent hit rate for the
original issue with it.


I've reproduced your IO hang with 2.0 and both 
9b1786829aefb83f37a8f3135e3ea91c56001b56 and 
a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.


Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the 
problem (but reintroduces block-migration hang). It's seems like qemu 
bug rather than guest problem, as no-kvmclock parameters makes no 
difference. IO just stops, all qemu IO threads die off. Almost like it 
forgets to migrate them:-)


I'm attaching backtrace from guest kernel and qemu and qemu command line.

Going to compile 2.1-rc.

--
mg
[  254.634525] SysRq : Show Blocked State
[  254.635041]   taskPC stack   pid father
[  254.635304] kworker/0:2 D 88013fc145c0 083  2 0x
[  254.635304] Workqueue: xfs-log/vdb xfs_log_worker [xfs]
[  254.635304]  880136bdfa58 0046 880136bdffd8 
000145c0
[  254.635304]  880136bdffd8 000145c0 880136ad8000 
88013fc14e88
[  254.635304]  880037bd4380 880037bc5068 880037bd43b0 
880037bd4380
[  254.635304] Call Trace:
[  254.635304]  [] io_schedule+0x9d/0x140
[  254.635304]  [] get_request+0x1b5/0x790
[  254.635304]  [] ? wake_up_bit+0x30/0x30
[  254.635304]  [] blk_queue_bio+0x96/0x390
[  254.635304]  [] generic_make_request+0xe2/0x130
[  254.635304]  [] submit_bio+0x71/0x150
[  254.635304]  [] ? bio_alloc_bioset+0x1e8/0x2e0
[  254.635304]  [] _xfs_buf_ioapply+0x2bb/0x3d0 [xfs]
[  254.635304]  [] ? xlog_bdstrat+0x1f/0x50 [xfs]
[  254.635304]  [] xfs_buf_iorequest+0x46/0xa0 [xfs]
[  254.635304]  [] xlog_bdstrat+0x1f/0x50 [xfs]
[  254.635304]  [] xlog_sync+0x265/0x450 [xfs]
[  254.635304]  [] xlog_state_release_iclog+0x92/0xb0 [xfs]
[  254.635304]  [] _xfs_log_force+0x15a/0x290 [xfs]
[  254.635304]  [] ? __switch_to+0x136/0x490
[  254.635304]  [] xfs_log_force+0x26/0x80 [xfs]
[  254.635304]  [] xfs_log_worker+0x24/0x50 [xfs]
[  254.635304]  [] process_one_work+0x17b/0x460
[  254.635304]  [] worker_thread+0x11b/0x400
[  254.635304]  [] ? rescuer_thread+0x400/0x400
[  254.635304]  [] kthread+0xcf/0xe0
[  254.635304]  [] ? kthread_create_on_node+0x140/0x140
[  254.635304]  [] ret_from_fork+0x7c/0xb0
[  254.635304]  [] ? kthread_create_on_node+0x140/0x140
[  254.635304] fio D 88013fc145c0 0   772770 0x
[  254.635304]  8800bba4b8c8 0082 8800bba4bfd8 
000145c0
[  254.635304]  8800bba4bfd8 000145c0 8801376ff1c0 
88013fc14e88
[  254.635304]  880037bd4380 880037baba90 880037bd43b0 
880037bd4380
[  254.635304] Call Trace:
[  254.635304]  [] io_schedule+0x9d/0x140
[  254.635304]  [] get_request+0x1b5/0x790
[  254.635304]  [] ? wake_up_bit+0x30/0x30
[  254.635304]  [] blk_queue_bio+0x96/0x390
[  254.635304]  [] generic_make_request+0xe2/0x130
[  254.635304]  [] submit_bio+0x71/0x150
[  254.635304]  [] do_blockdev_direct_IO+0x14bc/0x2620
[  254.635304]  [] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [] __blockdev_direct_IO+0x55/0x60
[  254.635304]  [] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [] xfs_vm_direct_IO+0x15c/0x180 [xfs]
[  254.635304]  [] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [] generic_file_aio_read+0x6d3/0x750
[  254.635304]  [] ? ktime_get_ts+0x48/0xe0
[  254.635304]  [] ? delayacct_end+0x8f/0xb0
[  254.635304]  [] ? down_read+0x12/0x30
[  254.635304]  [] xfs_file_aio_read+0x154/0x2e0 [xfs]
[  254.635304]  [] ? xfs_file_splice_read+0x140/0x140 [xfs]
[  254.635304]  [] do_io_submit+0x3b8/0x840
[  254.635304]  [] SyS_io_submit+0x10/0x20
[  254.635304]  [] system_call_fastpath+0x16/0x1b

Thread 3 (Thread 0x7f4250f50700 (LWP 11955)):
#0  0x7f4253d1a897 in ioctl () from /lib64/libc.so.6
#1  0x7f4257f8adf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f4258e2aa90, 
type=type@entry=44672)
at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796
#2  0x7f4257f8af35 in kvm_cpu_exec (cpu=cpu@entry=0x7f4258e2aa90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681
#3  0x7f4257f3071c in qemu_kvm_cpu_thread_fn (arg=0x7f4258e2aa90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#4  0x7f4253fe8f3a in start_thread () from /lib64/libpthread.so.0
#5  0x7f4253d22dad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f424b5ff700 (LWP 11957)):
#0  0x7f4253fecd0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x7f425802c019 in qemu_cond_wait (cond=cond@entry=0x7f4258f0cf

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła


I've reproduced your IO hang with 2.0 and both
9b1786829aefb83f37a8f3135e3ea91c56001b56 and
a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.

Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the
problem (but reintroduces block-migration hang). It's seems like qemu
bug rather than guest problem, as no-kvmclock parameters makes no
difference. IO just stops, all qemu IO threads die off. Almost like it
forgets to migrate them:-)


Some more info:

a) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 + 
a096b3a6732f846ec57dc28b47ee9435aa0609bf = hangs


b) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 = works

c) 2.0 + 9b1786829aefb83f37a8f3135e3ea91c56001b56 + move 
cpu_synchronize_state to migration.c = works


Tested with NFS (qcow2) + cache=none.

IO is dead only for disk that was being written to during migration.
I.e. if my test VM has two disks: vda and vdb, and I'm running fio on 
vdb and it hangs after migration, I can still issue writes to vda.


Recreation steps:
1. Create VM
2. Run fio (Andrey's config)
3. Live migrate VM couple of times.

--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła


Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
prepared image, and I have one hundred percent hit rate for the
original issue with it.


I've reproduced your IO hang with 2.0 and both
9b1786829aefb83f37a8f3135e3ea91c56001b56 and
a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.

Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the
problem (but reintroduces block-migration hang). It's seems like qemu
bug rather than guest problem, as no-kvmclock parameters makes no
difference. IO just stops, all qemu IO threads die off. Almost like it
forgets to migrate them:-)

I'm attaching backtrace from guest kernel and qemu and qemu command line.

Going to compile 2.1-rc.


2.1-rc2 behaves exactly the same.

Interestingly enough, reseting guest system causes I/O to work again. So 
it's not qemu that hangs on IO, rather it fails to notify guest about 
completed operations that were issued during migration.


And its somehow caused by calling cpu_synchronize_all_states() inside 
kvmclock_vm_state_change().




As for testing with cache=writeback, I'll try to setup some iscsi to 
test it.


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła


2.1-rc2 behaves exactly the same.

Interestingly enough, reseting guest system causes I/O to work again. So
it's not qemu that hangs on IO, rather it fails to notify guest about
completed operations that were issued during migration.

And its somehow caused by calling cpu_synchronize_all_states() inside
kvmclock_vm_state_change().



As for testing with cache=writeback, I'll try to setup some iscsi to test
it.


Awesome, thanks! AFAIK you`ll not be able to use write cache with
iscsi for migration. VM which had a reset before hangs always when
freshly launched have a chance to be migrated successfully. And yes,
it looks like lower layer forgetting to notify driver about some
operations at a glance.


Andrey,

could you try attached patch? It's an incredibly ugly workaround that 
calls cpu_synchronize_all_states() in a way that bypasses lazy execution 
logic.


But it works for me. If that works for you as well, its somehow related 
to lazy execution of cpu_synchronize_all_states.


--
mg
diff -ru qemu-2.1.0-rc2/cpus.c qemu-2.1.0-rc2-fixed/cpus.c
--- qemu-2.1.0-rc2/cpus.c	2014-07-15 23:49:14.0 +0200
+++ qemu-2.1.0-rc2-fixed/cpus.c	2014-07-17 15:09:09.306696284 +0200
@@ -505,6 +505,15 @@
 }
 }
 
+void cpu_synchronize_all_states_always(void)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+cpu_synchronize_state_always(cpu);
+}
+}
+
 void cpu_synchronize_all_post_reset(void)
 {
 CPUState *cpu;
diff -ru qemu-2.1.0-rc2/hw/i386/kvm/clock.c qemu-2.1.0-rc2-fixed/hw/i386/kvm/clock.c
--- qemu-2.1.0-rc2/hw/i386/kvm/clock.c	2014-07-15 23:49:14.0 +0200
+++ qemu-2.1.0-rc2-fixed/hw/i386/kvm/clock.c	2014-07-17 15:08:25.627063756 +0200
@@ -126,7 +126,7 @@
 return;
 }
 
-cpu_synchronize_all_states();
+cpu_synchronize_all_states_always();
 ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
 if (ret < 0) {
 fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
diff -ru qemu-2.1.0-rc2/include/sysemu/cpus.h qemu-2.1.0-rc2-fixed/include/sysemu/cpus.h
--- qemu-2.1.0-rc2/include/sysemu/cpus.h	2014-07-15 23:49:14.0 +0200
+++ qemu-2.1.0-rc2-fixed/include/sysemu/cpus.h	2014-07-17 15:09:23.256578916 +0200
@@ -7,6 +7,7 @@
 void pause_all_vcpus(void);
 void cpu_stop_current(void);
 
+void cpu_synchronize_all_states_always(void);
 void cpu_synchronize_all_states(void);
 void cpu_synchronize_all_post_reset(void);
 void cpu_synchronize_all_post_init(void);
diff -ru qemu-2.1.0-rc2/include/sysemu/kvm.h qemu-2.1.0-rc2-fixed/include/sysemu/kvm.h
--- qemu-2.1.0-rc2/include/sysemu/kvm.h	2014-07-15 23:49:14.0 +0200
+++ qemu-2.1.0-rc2-fixed/include/sysemu/kvm.h	2014-07-17 15:11:54.855303171 +0200
@@ -346,9 +346,11 @@
 #endif /* NEED_CPU_H */
 
 void kvm_cpu_synchronize_state(CPUState *cpu);
+void kvm_cpu_synchronize_state_always(CPUState *cpu);
 void kvm_cpu_synchronize_post_reset(CPUState *cpu);
 void kvm_cpu_synchronize_post_init(CPUState *cpu);
 
+
 /* generic hooks - to be moved/refactored once there are more users */
 
 static inline void cpu_synchronize_state(CPUState *cpu)
@@ -358,6 +360,13 @@
 }
 }
 
+static inline void cpu_synchronize_state_always(CPUState *cpu)
+{
+if (kvm_enabled()) {
+kvm_cpu_synchronize_state_always(cpu);
+}
+}
+
 static inline void cpu_synchronize_post_reset(CPUState *cpu)
 {
 if (kvm_enabled()) {
diff -ru qemu-2.1.0-rc2/kvm-all.c qemu-2.1.0-rc2-fixed/kvm-all.c
--- qemu-2.1.0-rc2/kvm-all.c	2014-07-15 23:49:14.0 +0200
+++ qemu-2.1.0-rc2-fixed/kvm-all.c	2014-07-17 15:14:04.884208826 +0200
@@ -1652,6 +1652,13 @@
 s->coalesced_flush_in_progress = false;
 }
 
+static void do_kvm_cpu_synchronize_state_always(void *arg)
+{
+CPUState *cpu = arg;
+
+kvm_arch_get_registers(cpu);
+}
+
 static void do_kvm_cpu_synchronize_state(void *arg)
 {
 CPUState *cpu = arg;
@@ -1669,6 +1676,11 @@
 }
 }
 
+void kvm_cpu_synchronize_state_always(CPUState *cpu)
+{
+run_on_cpu(cpu, do_kvm_cpu_synchronize_state_always, cpu);
+}
+
 void kvm_cpu_synchronize_post_reset(CPUState *cpu)
 {
 kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-17 Thread Marcin Gibuła


W dniu 2014-07-17 21:18, Dr. David Alan Gilbert pisze:

I don't know if this is the same case, but Gerd showed me a migration failure
that might be related.  2.0 seems OK, 2.1-rc0 is broken (and I've not found
another working point in between yet).

The test case involves booting a fedora livecd (using an IDE CDROM device)
and after the migration we're seeing squashfs errors and stuff gently
falling apart.


Perhaps you could try testing workaround patch I sent earlier? It's not 
proposal for inclusion, just a test patch that seems to fix IO hang for me.


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła


could you try attached patch? It's an incredibly ugly workaround that calls
cpu_synchronize_all_states() in a way that bypasses lazy execution logic.

But it works for me. If that works for you as well, its somehow related to
lazy execution of cpu_synchronize_all_states.

--
mg


Yes, it working well with writeback cache too.


Does it fix problem with libvirt migration timing out for you as well?

--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła


Does it fix problem with libvirt migration timing out for you as well?



Oh, forgot to mention - yes, all migration-related problems are fixed.
Though release right now in a freeze phase, I`d like to ask
maintainers to consider possibility of fixing the problem on top of
the current tree instead of just rolling back problematic snippet.


Paolo,

if patch in its current form is not acceptable for you for inclusion, 
I'll try rewrite it according to your comments.


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła


The name of the hack^Wfunction is tricky, because compared to
do_kvm_cpu_synchronize_state there are three things you change:

1) you always synchronize the state

2) the next call to do_kvm_cpu_synchronize_state will do
kvm_arch_get_registers


Yes.


3) the next CPU entry will call kvm_arch_put_registers:

 if (cpu->kvm_vcpu_dirty) {
 kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
 cpu->kvm_vcpu_dirty = false;
 }


But, I don't set cpu->kvm_vcpu_dirty anywhere (?).


I still lean very much towards reverting the patches now.  We can
reapply them, fixed, in 2.1.1.


That's probably good idea.

--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-18 Thread Marcin Gibuła


W dniu 2014-07-18 11:37, Paolo Bonzini pisze:

Il 18/07/2014 11:32, Marcin Gibuła ha scritto:



3) the next CPU entry will call kvm_arch_put_registers:

 if (cpu->kvm_vcpu_dirty) {
 kvm_arch_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
 cpu->kvm_vcpu_dirty = false;
 }


But, I don't set cpu->kvm_vcpu_dirty anywhere (?).


Yeah, the next CPU entry will *not* call kvm_arch_put_registers with
your change.  It will call it with vanilla cpu_synchronize_all_states().


That's because in kvmclock, it's used only to read cpu registers, not 
edit them.


Now, because making this call "invisible" makes it work, I'm speculating 
that following happens:


[migration starts]
kvmclock: calls cpu_synchronize_all_states()
somewhere in qemu: completes IO
somewhere in qemu: calls cpu_synchronize_all_states() <- old state


Is it (or something similar) possible? I didn't dig deep enough into 
internals yet, but perhaps you could point if thats the right direction?


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-30 Thread Marcin Gibuła


On 29.07.2014 18:58, Paolo Bonzini wrote:

Il 18/07/2014 10:48, Paolo Bonzini ha scritto:


It is easy to find out if the "fix" is related to 1 or 2/3: just write

  if (cpu->kvm_vcpu_dirty) {
  printf ("do_kvm_cpu_synchronize_state_always: look at 2/3\n");
  kvm_arch_get_registers(cpu);
  } else {
  printf ("do_kvm_cpu_synchronize_state_always: look at 1\n");
  }

To further refine between 2 and 3, I suppose you can set a breakpoint on
cpu_synchronize_all_states and kvm_cpu_exec, and see which is called
first after cpu_synchronize_all_states_always.


Marcin, have you ever gotten round to doing this?


Source side of migration, without my ugly hack:

called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
called kvm_cpu_synchronize_state: vcpu dirty
called kvm_cpu_synchronize_state: vcpu dirty
shutting down

without it:

called do_kvm_cpu_synchronize_state_always
called do_kvm_cpu_synchronize_state_always
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
shutting down

So it's probably about 2 from your list ("the next call to 
do_kvm_cpu_synchronize_state will do kvm_arch_get_registers").


I've tapped into kvm_cpu_exec() to find out if it's 
kvm_arch_put_registers(), but nothing was logged during migration so 
it's probably not 3.


--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-30 Thread Marcin Gibuła


W dniu 2014-07-30 15:38, Paolo Bonzini pisze:

Il 30/07/2014 14:02, Marcin Gibuła ha scritto:

without it:


s/without/with/ of course...


called do_kvm_cpu_synchronize_state_always
called do_kvm_cpu_synchronize_state_always
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
called do_kvm_cpu_synchronize_state: vcpu not dirty, getting registers
shutting down

So it's probably about 2 from your list ("the next call to
do_kvm_cpu_synchronize_state will do kvm_arch_get_registers").


Can you dump *env before and after the call to kvm_arch_get_registers?


Yes, but it seems they are equal - I used memcmp() to compare them. Is 
there any other side effect that cpu_synchronize_all_states() may have?


The second caller of this function is qemu_savevm_state_complete().

--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-07-31 Thread Marcin Gibuła


Can you dump *env before and after the call to kvm_arch_get_registers?


Yes, but it seems they are equal - I used memcmp() to compare them. Is
there any other side effect that cpu_synchronize_all_states() may have?


I think I found it.

The reason for hang is, because when second call to 
kvm_arch_get_registers() is skipped, it also skips kvm_get_apic() which 
updates cpu->apic_state.


--
mg

Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls

2014-08-04 Thread Marcin Gibuła


W dniu 2014-08-04 17:56, Stefan Hajnoczi pisze:

If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop.  This was reported
to happen with the mirror blockjob.


s/mirror/commit/


This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.

Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Ming Lei 
Cc: Marcin Gibuła 
Reported-by: Marcin Gibuła 
Signed-off-by: Stefan Hajnoczi 


I'll test it tomorrow.

--
mg

Re: [Qemu-devel] latest rc: virtio-blk hangs forever after migration

2014-08-04 Thread Marcin Gibuła


W dniu 2014-07-31 13:27, Marcin Gibuła pisze:

Can you dump *env before and after the call to kvm_arch_get_registers?


Yes, but it seems they are equal - I used memcmp() to compare them. Is
there any other side effect that cpu_synchronize_all_states() may have?


I think I found it.

The reason for hang is, because when second call to
kvm_arch_get_registers() is skipped, it also skips kvm_get_apic() which
updates cpu->apic_state.


Paolo,

is this analysis deep enough for you? I don't know if that can be fixed 
with existing api as cpu_synchronize_all_states() is all or nothing kind 
of stuff.


Kvmclock needs it only to read current cpu registers, so syncing 
everything is not really necessary. Perhaps exporting one of 
kvm_arch_get_* would be enough. And it wouldn't mess with lazy get/put.


On the other hand, if in future any other driver adds 
cpu_synchronize_all_states() in its change state callback it could 
result in same error so perhaps more generic approach is needed.


--
mg

Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls

2014-08-05 Thread Marcin Gibuła


On 04.08.2014 17:56, Stefan Hajnoczi wrote:

If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop.  This was reported
to happen with the mirror blockjob.

This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.

Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Ming Lei 
Cc: Marcin Gibuła 
Reported-by: Marcin Gibuła 
Signed-off-by: Stefan Hajnoczi 


Still hangs...

Backtrace still looks like this:

Thread 1 (Thread 0x7f3d5313a900 (LWP 17440)):
#0  0x7f3d4f38f286 in ppoll () from /lib64/libc.so.6
#1  0x7f3d5347465b in ppoll (__ss=0x0, __timeout=0x0, 
__nfds=, __fds=) at 
/usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, 
timeout=)
at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/qemu-timer.c:314
#3  0x7f3d53475970 in aio_poll (ctx=ctx@entry=0x7f3d54270c00, 
blocking=blocking@entry=true)
at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/aio-posix.c:250
#4  0x7f3d534695e7 in bdrv_drain_all () at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:1924
#5  0x7f3d5346fe1f in bdrv_close (bs=bs@entry=0x7f3d5579b340) at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:1820
#6  0x7f3d53470047 in bdrv_delete (bs=0x7f3d5579b340) at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:2094
#7  bdrv_unref (bs=0x7f3d5579b340) at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:5376
#8  0x7f3d5347030b in bdrv_drop_intermediate 
(active=active@entry=0x7f3d54635e20, top=top@entry=0x7f3d5579b340, 
base=base@entry=0x7f3d54d956b0,
backing_file_str=0x7f3d54d95700 
"/mnt/nfs/volumes/7c13c27f-0c48-4676-b075-6e8a3325383e/3785abe6-d2df-49da-9cba-e15cfce8e2af.qcow2")
at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block.c:2643
#9  0x7f3d5335121a in commit_run (opaque=0x7f3d545cdac0) at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/block/commit.c:145
#10 0x7f3d5347ebca in coroutine_trampoline (i0=, 
i1=)
at 
/var/tmp/portage/app-emulation/qemu-2.1.0/work/qemu-2.1.0/coroutine-ucontext.c:118

#11 0x7f3d4f2f49f0 in ?? () from /lib64/libc.so.6
#12 0x7fff27d5ef50 in ?? ()
#13 0x in ?? ()


--
mg

Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls

2014-08-05 Thread Marcin Gibuła


On 05.08.2014 16:26, Marcin Gibuła wrote:

On 04.08.2014 17:56, Stefan Hajnoczi wrote:

If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop.  This was reported
to happen with the mirror blockjob.

This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.

Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Ming Lei 
Cc: Marcin Gibuła 
Reported-by: Marcin Gibuła 
Signed-off-by: Stefan Hajnoczi 


Still hangs...


I'm sorry, ignore this comment.

I've built my test qemu without aio support. Retesting now.

--
mg

Re: [Qemu-devel] [PATCH v2 0/2] thread-pool: avoid fd usage and fix nested aio_poll() deadlock

2014-08-05 Thread Marcin Gibuła


On 15.07.2014 17:17, Paolo Bonzini wrote:

Il 15/07/2014 16:44, Stefan Hajnoczi ha scritto:

v2:
 * Leave BH scheduled so that the code can be simplified [Paolo]

These patches convert thread-pool.c from EventNotifier to QEMUBH.
They then
solve the deadlock when nested aio_poll() calls are made.

Please speak out whether you want this in QEMU 2.1 or not.  I'm not
aware of
the nested aio_poll() deadlock ever having been reported, so maybe we
can defer
to QEMU 2.2.


It was reported as a hang in block_commit.  Marcin, can you please test
these patches?


Sorry for late answer - yes, it seems to fix block_commit hang when 
using thread-pool.


--
mg

Re: [Qemu-devel] [PATCH] linux-aio: avoid deadlock in nested aio_poll() calls

2014-08-05 Thread Marcin Gibuła


W dniu 2014-08-04 17:56, Stefan Hajnoczi pisze:

If two Linux AIO request completions are fetched in the same
io_getevents() call, QEMU will deadlock if request A's callback waits
for request B to complete using an aio_poll() loop.  This was reported
to happen with the mirror blockjob.

This patch moves completion processing into a BH and makes it resumable.
Nested event loops can resume completion processing so that request B
will complete and the deadlock will not occur.

Cc: Kevin Wolf 
Cc: Paolo Bonzini 
Cc: Ming Lei 
Cc: Marcin Gibuła 
Reported-by: Marcin Gibuła 
Signed-off-by: Stefan Hajnoczi 


This patch fixes the block-commit hang when using linux-aio, so:

Tested-by: Marcin Gibuła 

--
mg

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-03-27 Thread Marcin Gibuła


W dniu 2014-03-27 23:52, Chris Dunlop pisze:

Hi,

I have a problem where I migrate a linux guest VM, and on the
receiving side the guest goes to 100% cpu as seen by the host, and
the guest itself is unresponsive, e.g. not responding to ping etc.
The only way out I've found is to destroy the guest.

This seems to only happen if the guest has been idle for an extended
period (e.g. overnight). I've migrated the guest 100 times in a row
without any problems when the guest has been used "a little" (e.g.
logging in and looking around, it's not doing anything normally).


Hi,

I've seen very similar problem on our installation. Have you tried to 
run with kvm-clock explicitly disabled (either via no-kvmclock in guest 
kernel or with -kvm-clock in qemu) ?


--
mg

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-03-31 Thread Marcin Gibuła


I've seen very similar problem on our installation. Have you tried to
run with kvm-clock explicitly disabled (either via no-kvmclock in
guest kernel or with -kvm-clock in qemu) ?


No, I haven't tried it yet (I've confirmed kvm-clock is currently
being used). I'll have a look at it.

Did it help your issue?


My results were inconclusive, but there way a guy two months ago who had 
the same problem and disabling kvm-clock resolved this for him.


I wonder if it'll help you as well.

--
mg

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła


It's looking good so far, after a few migrations (it takes a while to test
because I'm waiting at least 5 hours between migrations). I'll be happier
once I've done a couple of weeks of this without any failures!


Does anyone have any hints how to debug this thing? :(

I've tried to put hanged guest under gdb and found it's looped deep 
inside kernel time management functions. Disabling kvmclock suggests it 
is somehow related to its corruption during migration. It happens on 
both old and new versions of guest kernels.


Any hints from developers are welcome:)

--
mg

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła


Can you give:
   1) A backtrace from the guest
  thread apply all bt full
  in gdb


You mean from gdb attached to hanged guest? I'll try to get it. From 
what I remember it looks rather "normal" - busy executing guest code.



   2) What's the earliest/newest qemu versions you've seen this on?


1.4 - 1.6
Don't know about earlier versions because I didn't use migration on 
them. Haven't tried 1.7 yet (I know about XBZRLE fixes, but it happened 
without it as well...).



   3) What guest OS are you running?


All flavors of Centos, Ubuntu, Redhat, etc. Also Windows. But never seen 
a crash with Windows so far.


Seems that few people who also have this issue, reports success with 
kvmclock disabled (either in qemu or kernel command line).



   4) What host OS are you running?


Distro is Gentoo based (with no crazy compiler options). I've been using 
kernel 3.4 - 3.10.



   5) What CPU are you running on?


AMD Opteron(tm) Processor 6164 HE


   6) What does your qemu command line look like?


Example VM:
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name 
3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -S -machine 
pc-i440fx-1.5,accel=kvm,usb=off -cpu 
qemu64,+misalignsse,+abm,+lahf_lm,+rdtscp,+popcnt,+x2apic,-svm,+kvmclock 
-m 1024 -realtime mlock=on -smp 2,sockets=4,cores=12,threads=1 -uuid 
3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -smbios type=0,vendor=HAL 9000 
-smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/3b5e37ea-04be-4a6b-8d63-f1a5853f2138.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,clock=vm,driftfix=slew -no-hpet -no-kvm-pit-reinjection 
-no-shutdown -boot menu=off -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive 
file=/dev/stor1c/2e7fd7aa-8588-47ed-a091-af2b81c9e935,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:11:11:11:11,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/f16x86_64.agent,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 
-device usb-tablet,id=input0 -vnc 0.0.0.0:4,password -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox on


I've tried playing with different CPU model (Opteron_G3) and flags, it 
didn't make any difference.



   7) How exactly are you migrating?


Via libvirt live migration. Seen it with and without XBZRLE enabled.


   8) You talk about having to wait a few hours to trigger it - do
  you have a more exact description of a test?


Yes, that's where it gets weird. I've never seen this on fresh VM. It 
needs to be idle for couple of hours at least. And even then it doesn't 
always hang.



   9) Is there any output from qemu stderr/stdout in your qemu logs?


Nothing unusual. From QEMU point of view guest is up and running. Only 
its OS is hanged (but not panicked, there is no backtrace,  oops or BUG 
on its screen).


--
mg

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła


On 02.04.2014 11:39, Dr. David Alan Gilbert wrote:

* Marcin Gibu??a (m.gib...@beyond.pl) wrote:

Can you give:
   1) A backtrace from the guest
  thread apply all bt full
  in gdb


You mean from gdb attached to hanged guest? I'll try to get it. From
what I remember it looks rather "normal" - busy executing guest
code.


yes; if you can send it a sysrq to trigger a backtrace it might also
be worth a try - I'm just trying to find what the guest is really doing
when it's apparentyly 'hung'.


IIRC VM doesn't respond to sysrq key sequence. It doesn't respond to 
anything actually but NMI. I tried to do inject-nmi. VMs kernel 
responded with timestamped message "Uhhuh. NMI received. Dazed and 
confused, but trying to continue". That timestamp never changes - its 
like time is frozen on VM.


I'll try to find my notes from this gdb session.

--
mg

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-02 Thread Marcin Gibuła


Yes, that's where it gets weird. I've never seen this on fresh VM.
It needs to be idle for couple of hours at least. And even then it
doesn't always hang.


So your OS is just sitting at a text console, running nothing special?
When you reboot after the migration what's the last thing you see
in the guests logs? Is there anything from after the migration?


Yes, it's completely idle. After reboot there is nothing in logs. I've 
dumped memory of one of hanged test VMs and found kernel message buffer. 
The last entries were:



init: failsafe main process (659) killed by TERM signal
init: plymouth-upstart-bridge main process (651) killed by TERM signal



Clocksource tsc unstable (delta = 470666274 ns)



Uhhuh. NMI received for unknown reason 30 on CPU 0.
Do you have a strange power saving mode enabled?I:
Dazed and confused, but trying to continue
Uhhuh. NMI received for unknown reason 20 on CPU 0.
Do you have a strange power saving mode enabled?I:
Dazed and confused, but trying to continue
<0>Dazed and confused, but trying to continue



I've tried to disassemble where VM kernel (3.8.something from Ubuntu) is 
spinning (using qemu-monitor, registers info and symbols from guest 
kernel) and it was loop inside __run_timers function from kernel/timer.c:


while (time_after_eq(jiffies, base->timer_jiffies)) {
  ...
}

However my disassembly and qemu debugging skills are limited, would it 
help if I dump memory of broken VM and send it you somehow?


--
mg

[Qemu-devel] qemu 2.0.0-rc2 crash

2014-04-10 Thread Marcin Gibuła


Hi,

I've been playing with QEMU 2.0-rc2 and found a crash that isn't there 
in 1.7.1.


Virtual machine is created via libvirt and when I query it with 
'dommemstat' it crashes with following backtrace:


Program received signal SIGSEGV, Segmentation fault.
0x7f5883655c0a in object_class_dynamic_cast (class=0x7f588618fbb0, 
typename=typename@entry=0x7f58837ebe54 "object") at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:525
525 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c: 
No such file or directory.

(gdb) bt
#0  0x7f5883655c0a in object_class_dynamic_cast 
(class=0x7f588618fbb0, typename=typename@entry=0x7f58837ebe54 "object") 
at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:525
#1  0x7f5883655da5 in object_dynamic_cast (obj=0x7f58861604c0, 
typename=typename@entry=0x7f58837ebe54 "object") at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:456
#2  0x7f5883657d6e in object_resolve_abs_path (parent=out>, parts=parts@entry=0x7f5886352ad0, 
typename=typename@entry=0x7f58837ebe54 "object", index=index@entry=1)
at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:1244
#3  0x7f5883657f20 in object_resolve_path_type (path=out>, typename=0x7f58837ebe54 "object", ambiguous=0x7fff1ccab257) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qom/object.c:1312
#4  0x7f5883652d7f in qmp_qom_list (path=0x7f588615c9a0 
"//machine/i440fx/pci.0/child[9]", errp=errp@entry=0x7fff1ccab290) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qmp.c:201
#5  0x7f588364dd55 in qmp_marshal_input_qom_list (mon=out>, qdict=, ret=0x7fff1ccab310) at qmp-marshal.c:2490
#6  0x7f58836ef4e8 in qmp_call_cmd (params=0x7f58893626b0, 
mon=0x7f5885c9ec90, cmd=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/monitor.c:4760
#7  handle_qmp_command (parser=, tokens=) 
at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/monitor.c:4826
#8  0x7f588378289a in json_message_process_token 
(lexer=0x7f5885ca00a0, token=0x7f58861a0500, type=JSON_OPERATOR, x=95, 
y=20) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-streamer.c:87
#9  0x7f5883797c4f in json_lexer_feed_char 
(lexer=lexer@entry=0x7f5885ca00a0, ch=125 '}', flush=flush@entry=false) 
at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-lexer.c:303
#10 0x7f5883797d96 in json_lexer_feed (lexer=0x7f5885ca00a0, 
buffer=, size=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-lexer.c:356
#11 0x7f5883782ab1 in json_message_parser_feed (parser=out>, buffer=, size=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qobject/json-streamer.c:110
#12 0x7f58836ed593 in monitor_control_read (opaque=, 
buf=, size=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/monitor.c:4847
#13 0x7f588363d4e1 in qemu_chr_be_write (len=, 
buf=0x7fff1ccab4f0 "}", s=0x7f5885caf0b0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-char.c:165
#14 tcp_chr_read (chan=, cond=, 
opaque=0x7f5885caf0b0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-char.c:2487
#15 0x7f58814d0b75 in g_main_context_dispatch () from 
/usr/lib64/libglib-2.0.so.0
#16 0x7f588360b0e8 in glib_pollfds_poll () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:190
#17 os_host_main_loop_wait (timeout=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:235
#18 main_loop_wait (nonblocking=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:484
#19 0x7f58834dbb6e in main_loop () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:2051
#20 main (argc=, argv=, envp=out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:4507


Virtual machine options command line:

LC_ALL=C 
PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin 
HOME=/ USER=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name 
f1b3b8b7-7b0e-4eab-afef-06d577d6544d -S -machine 
pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 4096 
-realtime mlock=on -smp 4,sockets=2,cores=10,threads=1 -uuid 
f1b3b8b7-7b0e-4eab-afef-06d577d6544d -smbios type=0,vendor=HAL 9000 
-smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/f1b3b8b7-7b0e-4eab-afef-06d577d6544d.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard 
-no-shutdown -boot menu=off,strict=on -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,

Re: [Qemu-devel] qemu 2.0.0-rc2 crash

2014-04-10 Thread Marcin Gibuła


W dniu 2014-04-10 15:43, Marcel Apfelbaum pisze:

On Thu, 2014-04-10 at 14:55 +0200, Marcin Gibuła wrote:

Hi,

I've been playing with QEMU 2.0-rc2 and found a crash that isn't there
in 1.7.1.

Hi Marcin,
Thanks for reporting the bug!

Do you have a development environment?
If you do, and the reproduction is fast (and you already have a setup),
a git bisect to find the problematic commit would be appreciated,


Hi,

yes, it's on development environment. If you could point me to some 
quick guide to bisecting qemu, I'll be happy to do it.


--
mg

Re: [Qemu-devel] troubleshooting live migration

2014-01-16 Thread Marcin Gibuła


> I tried -no-hpet, was still able to replicate the 'lapic' issue. I
> find it interesting that I can only trigger it if the vm has been
> running awhile.

Hi,

I've seen identical crashes with live migration in our environment. It 
looks identical - VM has to be idle for some time and after migration 
CPU is at 100% and VM is dead. All migration happens between same hardware.


I don't think I've ever had Windows guest crashing like this and I think 
this is somehow related to kvmclock. I've tried to debug qemu guest 
process and from I can tell, its kernel is busy looping in some time 
management related functions. Could you try to reproduce this issue with 
-no-kvmclock? Our testing environment is currently offline so I can't 
test it myself.


We also use 3.10 kernel (though 3.8 wasn't working either) and strugled 
with this issue with qemu 1.4, 1.5 and 1.6. Didn't test 1.7. Also we're 
using AMD CPUs, so it seems to be platform independend.


--
mg

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-06 Thread Marcin Gibuła


On 06.02.2014 15:03, Stefan Priebe - Profihost AG wrote:

some more things which happen during migration:

php5.2[20258]: segfault at a0 ip 00740656 sp 7fff53b694a0
error 4 in php-cgi[40+6d7000]

php5.2[20249]: segfault at c ip 7f1fb8ecb2b8 sp 7fff642d9c20
error 4 in ZendOptimizer.so[7f1fb8e71000+147000]

cron[3154]: segfault at 7f0008a70ed4 ip 7fc890b9d440 sp
7fff08a6f9b0 error 4 in libc-2.13.so[7fc890b67000+182000]


Hi,

I've seen memory corruptions after live (and offline) migrations as 
well. But in our enviroment its mostly (but not only) seen as timer 
corruption - guest hangs or have insane date in future. But I've seen 
segfaults and oopses as well.


Sadly it's very hard for me to reproduce it reliably but it occures on 
all types of linux guests - all versions of ubuntu, centos, debian, etc, 
so it doesn't seem to be connected to a specific guest kernel version. 
I've never seen windows crashing though. There was another guy here on 
qemu-devel who had similar issue and fixed it by running guest with 
no-kvmclock.


I've tested qemu 1.4 - 1.6 and kernels 3.4 - 3.10.

--
mg

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Marcin Gibuła


do you use xbzrle for live migration ?


no - i'm really stucked right now with this. Biggest problem i can't
reproduce with test machines ;-(


Only being able to test on your production VMs isn't fun;
is it possible or you to run an extra program on these VMs - e.g.
if we came up with a simple (userland) memory test?


You mean to reproduce?

I already tried https://code.google.com/p/stressapptest/ while migrating
on a test VM but this works fine.

I also tried running mysql bench while migrating on a test vm and this
works too ;-(


Have you tried to let test VM run idle for some time before migrating? 
(like 18-24 hours)


Having the same (or very similar) problem, I had bigger luck with 
reproducing it by not using freshly started VMs.


--
mg

Re: [Qemu-devel] migration question: disk images on nfs server

2014-02-07 Thread Marcin Gibuła


For NFS you need to use the sync mount option to force the NFS client to
sync to
server on writes.


Isn't opening with O_DIRECT enough? (for linux nfs client at least)

--
mg

Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry

2014-02-07 Thread Marcin Gibuła


You mean to reproduce?


I'm more interested in seeing what type of corruption is happening;
if you've got a test VM that corrupts memory and we can run a program
in that vm that writes a known pattern into memory and checks it
then see what changed after migration, it might give a clue.

But obviously this would only be of any use if run on the VM that actually
fails.


Hi,

Seeing similar issue in my company I would be happy to run such tests. 
Do you have any test suite I could run or some leads how to write it?


--
mg

Re: [Qemu-devel] migration question: disk images on nfs server

2014-02-07 Thread Marcin Gibuła


It is more a NFS issue, if you have a file in NFS that two users in
two different host are accessing (one at least write to it) you will
need to enforce the "sync" option.
Even if you flush all the data and close the file the NFS client can still
have cached data that it didn't sync to the server.


Do you know if is applies to linux O_DIRECT writes as well?

From comment in fs/nfs/direct.c:

* When an application requests uncached I/O, all read and write requests
* are made directly to the server; data stored or fetched via these
* requests is not cached in the Linux page cache.  The client does not
* correct unaligned requests from applications.  All requested bytes are
* held on permanent storage before a direct write system call returns to
* an application.



--
mg

Re: [Qemu-devel] migration question: disk images on nfs server

2014-02-07 Thread Marcin Gibuła


On 07.02.2014 14:36, Orit Wasserman wrote:

Do you know if is applies to linux O_DIRECT writes as well?



 From the man of open:

The behaviour of O_DIRECT with NFS will differ from local
filesystems.  Older kernels, or kernels configured in certain ways,
may not support this combination.  The NFS protocol does not
support
passing the flag to the server, so O_DIRECT I/O will bypass the
page
cache only on the client; the server may still cache the I/O.  The
client asks the server to make the I/O synchronous to preserve the
synchronous semantics of O_DIRECT.  Some servers will perform
poorly
under these circumstances, especially if the I/O size is small.
Some
servers may also be configured to lie to clients about the I/O
having
reached stable storage; this will avoid the performance penalty at
some risk to data integrity in the event of server power failure.
The Linux NFS client places no alignment restrictions on O_DIRECT
I/O.

To summaries it depends on your kernel (NFS client).


So, assuming new kernel (where nfs O_DIRECT translates to no cache at 
client side) and cache coherent server, is it enough or is 'sync' mount 
(or O_SYNC flag) still required for some reason?


--
mg

Re: [Qemu-devel] Unresponsive linux guest once migrated

2014-04-15 Thread Marcin Gibuła


W dniu 2014-04-15 20:53, Dr. David Alan Gilbert pisze:

* Marcus (shadow...@gmail.com) wrote:

I can answer some of the questions. It's been 3 months or so since I
looked into it. I ended up disabling kvmclock from the qemu command
line and moving on. I saw it with CentOS 6.5 and Ubuntu 12.04 guests.
Sending the guest to the BIOS CLI or PXE would not reproduce the
issue. I didn't attempt an array of qemu versions, but I can say that
it did occur on 1.7.0 and 1.6.1, with the host running kernel 3.10 or
3.12. The CPUs are Intel E5-2650.


If you could test it with the latest 2.0.x-rc that would be interesting to
know, since you have a setup where it fails for you.


Hi,
I'll soon be able to test it with the newest version.

And if it fails - what next steps should I take to help debug it? VM is 
usually pretty much dead and unresponsive to anything but NMI.


--
mg

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcin Gibuła


W dniu 2014-05-05 15:51, Alexander Graf pisze:

When we migrate we ask the kernel about its current belief on what the guest
time would be. However, I've seen cases where the kvmclock guest structure
indicates a time more recent than the kvm returned time.


Hi,

is it possible to have kvmclock jumping forward?

Because I've reproducible case when at about 1 per 20 vm restores, VM 
freezes for couple of hours and then resumes with date few hundreds 
years ahead. Happens only with kvmclock.


And this patch seems to fix very similar issue so maybe it's all the 
same bug.


--
mg

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcin Gibuła


is it possible to have kvmclock jumping forward?

Because I've reproducible case when at about 1 per 20 vm restores, VM freezes 
for couple of hours and then resumes with date few hundreds years ahead. 
Happens only with kvmclock.

And this patch seems to fix very similar issue so maybe it's all the same bug.


I'm fairly sure it is the exact same bug. Jumping backward is like jumping 
forward by a big amount :)


Hi,

I've tested your path on my test VM... don't know if it's pure luck or 
not, but it didn't hang with over 70 restores.


The message "KVM Clock migrated backwards, using later time" fires every 
time, but VM is healthy after resume.


--
mg

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-06 Thread Marcin Gibuła


What is the host clocksource? (cat
/sys/devices/system/clocksource/clocksource0/current_clocksource).


tsc


And kernel version?


3.12.17
But I've seen this problem on earlier versions as well (3.8, 3.10).

--
mg

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-06 Thread Marcin Gibuła


Yes, and it isn't. Any ideas why it's not? This patch really just uses
the guest visible kvmclock time rather than the host view of it on
migration.

There is definitely something very broken on the host's side since it
does return a smaller time than the guest exposed interface indicates.


Don't know if helps but here are example values from time_at_migration 
and s->clock from your patch.


Tested on 5 restores of saved VM that (used to) hang:

   s->clock  time_at_migration
157082235125698  157113284546655
157082235125698  157113298196976
157082235125698  157113284615117
157082235125698  157113284486601
157082235125698  157113284479740

Now, when I compare system time on guest with and without patch:

On unpatched qemu vm restores with date: Apr 18 06:56:36
On patched qemu it says: Apr 18 06:57:06

--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-26 Thread Marcin Gibuła


Two options for making progress on this bug:

1. Debug bdrv_drain_all() and find out whether there are any I/O
requests remaining.


Yes, there is one request pending on active layer of disk that is being 
commited (on bs->tracked_requests list). IO threads die off because they 
have nothing to do... it seems that requests are somehow not commited 
into threads.


I tried hard (and will continue to try) to debug this, but documentation 
is limited :-) so ANY tips where to look are welcome.



2. Post steps for reproducing this problem (exact command-lines or virsh
commands used).


I'm using application that talks with libvirt via API, so I describe 
what it does.


1. Create a VM, boot a system. I'm using iso from http://www.sysresccd.org

2. VM has a mounted QCOW2 disk with following hierarchy:
[file1] -> [file2 (active)]

Both are qcow2 files.

3. Open console. Start command:

while true; do dd if=/dev/zero of=/dev/vdX bs=512k oflag=direct; done;

Where vdX is of course qcow2 disk described above.

4. Create snapshot of file2 (virDomainSnapshotCreateXML). So now we have:

[file1] -> [file2] -> [file3 (active)] -

5. Wait couple of seconds (so snapshot fills up).

6. Commit file2 into file1 (virDomainBlockCommit).

7. During commit, another threads is using virDomainGetBlockJobInfo() to 
query its progress.


Note - it doesn't always happen. I have about 1 per 10 failure rate with 
this procedure.


Do you want me to reproduce it manually with pure virsh?

--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Marcin Gibuła


What happens if you omit #7 virDomainGetBlockJobInfo()?  Does it still
hang 1/10 times?


Yes, it still hangs.


Can you post the QEMU command-line so we know the precise VM
configuration?  (ps aux | grep qemu)


/usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a 
-S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock 
-m 1536 -realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 
68189c3c-02f6-4aae-88a2-5f13c5e6f53a -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard 
-no-shutdown -boot menu=off,strict=on -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/b6ed3ffc-ddca-4f10-839b-81a5b1ce371f.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk5,id=virtio-disk5,bootindex=2 
-drive 
file=/root/rescue.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw 
-device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 
-vnc 0.0.0.0:2,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox on


--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Marcin Gibuła


/usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S
-machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536
-realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid
68189c3c-02f6-4aae-88a2-5f13c5e6f53a -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/68189c3c-02f6-4aae-88a2-5f13c5e6f53a.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard
-no-shutdown -boot menu=off,strict=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/mnt/nfs/volumes/7dcbd9ba-f0bc-4d3c-9b5c-b2ac824584d5/b6ed3ffc-ddca-4f10-839b-81a5b1ce371f.qcow2,if=none,id=drive-virtio-disk5,format=qcow2,cache=none,aio=threads,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk5,id=virtio-disk5,bootindex=2
-drive
file=/root/rescue.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw
-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:82:41:c9,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc
0.0.0.0:2,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox on


Please try disabling I/O limits on the drive and try again.


Still hangs.

--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-29 Thread Marcin Gibuła


Please try disabling I/O limits on the drive and try again.


Is there anything else I could try?


I've captured trace of hanged VM with following events traced:

bdrv_*
paio_*
thread_pool_*
commit_*
qcow2_*

and debug code that prints requests from traced_requests in 
bdrv_requests_pending function.


It's available here: http://filebin.net/tmscfay2pa/hanged-trace.gz (50mb 
after decompression)


--
mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-30 Thread Marcin Gibuła


1. Debug bdrv_drain_all() and find out whether there are any I/O
requests remaining.


I believe that's what happens:

Context 1:
- commit_one_iteration makes write request (req A)
- request A is handled to io thread, qemu_coroutine_yield() is called

Context 2:
- VM makes write request (req B)
- request B is inserted into bs->tracked_requests
- request B is handled to io thread, qemu_coroutine_yield() is called

- request A is completed, bdrv_co_io_em notification is called and jumps 
into context 1


- meanwhile request B is completed. Main thread is currently executing 
context 1


Context 1:
- calls bdrv_drain_all
- calls bdrv_requests_pending_all. It returns true as 
bs->tracked_request is not empty (it still has req B)
- calls aio_pool which hangs, as req B has been already completed but it 
notification has not been called yet. (this part I'm not sure. But it 
hangs forever for some reason...)


This is based from traces and debug prints I collected.

I've made patch that moves bdrv_drop_intermediate() into separate bottom 
half and couldn't recreate hang after this. But it probably affects 
mirror_run as well so I don't know if this is acceptable solution for you.


--
mg

[Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-05-31 Thread Marcin Gibuła

When two coroutines submit I/O and first coroutine depends on second to 
complete (by calling bdrv_drain_all), deadlock may occur.


This is because both requests may have completed before thread pool 
notifier got called. Then, when notifier gets executed and first 
coroutine calls aio_pool() to make progress, it will hang forever, as 
notifier's descriptor has been already marked clear.


This patch fixes this, by rearming thread pool notifier if there are 
more than one completed requests on list.


Without this patch, I could reproduce this bug with snapshot-commit with 
about 1 per 10 tries. With this patch, I couldn't reproduce it any more.


Signed-off-by: Marcin Gibula 
---

--- thread-pool.c   2014-04-17 15:44:45.0 +0200
+++ thread-pool.c   2014-05-31 20:20:26.083011514 +0200
@@ -76,6 +76,8 @@ struct ThreadPool {
 int new_threads; /* backlog of threads we need to create */
 int pending_threads; /* threads created but not running yet */
 int pending_cancellations; /* whether we need a cond_broadcast */
+int pending_completions; /* whether we need to rearm notifier when
+executing callback */
 bool stopping;
 };

@@ -110,6 +112,10 @@ static void *worker_thread(void *opaque)
 ret = req->func(req->arg);

 req->ret = ret;
+if (req->common.cb) {
+pool->pending_completions++;
+}
+
 /* Write ret before state.  */
 smp_wmb();
 req->state = THREAD_DONE;
@@ -185,6 +191,14 @@ restart:
 }
 if (elem->state == THREAD_DONE && elem->common.cb) {
 QLIST_REMOVE(elem, all);
+/* If more completed requests are waiting, notifier needs
+ * to be rearmed so callback can progress with aio_pool().
+ */
+pool->pending_completions--;
+if (pool->pending_completions) {
+event_notifier_set(notifier);
+}
+
 /* Read state before ret.  */
 smp_rmb();
 elem->common.cb(elem->common.opaque, elem->ret);

Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-01 Thread Marcin Gibuła


Good catch!  The main problem with the patch is that you need to use
atomic_inc/atomic_dec to increment and decrement pool->pending_completions.


Ok.


Secondarily, event_notifier_set is pretty heavy-weight, does it work if
you wrap the loop like this?

restart:
 QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
 ...
 }
 if (pool->pending_completions) {
 goto restart;
 }
 event_notifier_test_and_clear(notifier);
 if (pool->pending_completions) {
 event_notifier_set(notifier);
 goto restart;
 }


I'll test it tomorrow. I assume you want to avoid calling 
event_notifier_set() until function is reentered via aio_pool?


> Finally, the same bug is also in block/linux-aio.c and
> block/win32-aio.c.

I can try with linux-aio, but my knowledge of windows api is zero...

--
mg

[Qemu-devel] [PATCH v2] thread-pool: fix deadlock when callbacks depends on each other

2014-06-02 Thread Marcin Gibuła

When two coroutines submit I/O and first coroutine depends on second to 
complete (by calling bdrv_drain_all), deadlock may occur.


This is because both requests may have completed before thread pool 
notifier got called. Then, when notifier gets executed and first 
coroutine calls aio_pool() to make progress, it will hang forever, as 
notifier's descriptor has been already marked clear.


This patch fixes this, by deferring clearing notifier until no 
completions are pending.


Without this patch, I could reproduce this bug with snapshot-commit with 
about 1 per 10 tries. With this patch, I couldn't reproduce it any more.


Signed-off-by: Marcin Gibula 
---

--- thread-pool.c   2014-04-17 15:44:45.0 +0200
+++ thread-pool.c   2014-06-02 09:10:25.442260590 +0200
@@ -76,6 +76,8 @@ struct ThreadPool {
 int new_threads; /* backlog of threads we need to create */
 int pending_threads; /* threads created but not running yet */
 int pending_cancellations; /* whether we need a cond_broadcast */
+int pending_completions; /* whether we need to rearm notifier when
+executing callback */
 bool stopping;
 };

@@ -110,6 +112,10 @@ static void *worker_thread(void *opaque)
 ret = req->func(req->arg);

 req->ret = ret;
+if (req->common.cb) {
+atomic_inc(&pool->pending_completions);
+}
+
 /* Write ret before state.  */
 smp_wmb();
 req->state = THREAD_DONE;
@@ -173,7 +179,6 @@ static void event_notifier_ready(EventNo
 ThreadPool *pool = container_of(notifier, ThreadPool, notifier);
 ThreadPoolElement *elem, *next;

-event_notifier_test_and_clear(notifier);
 restart:
 QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
 if (elem->state != THREAD_CANCELED && elem->state != 
THREAD_DONE) {

@@ -185,6 +190,8 @@ restart:
 }
 if (elem->state == THREAD_DONE && elem->common.cb) {
 QLIST_REMOVE(elem, all);
+atomic_dec(&pool->pending_completions);
+
 /* Read state before ret.  */
 smp_rmb();
 elem->common.cb(elem->common.opaque, elem->ret);
@@ -196,6 +203,19 @@ restart:
 qemu_aio_release(elem);
 }
 }
+
+/* Double test of pending_completions is necessary to
+ * ensure that there is no race between testing it and
+ * clearing notifier.
+ */
+if (atomic_read(&pool->pending_completions)) {
+goto restart;
+}
+event_notifier_test_and_clear(notifier);
+if (atomic_read(&pool->pending_completions)) {
+event_notifier_set(notifier);
+goto restart;
+}
 }

 static void thread_pool_cancel(BlockDriverAIOCB *acb)

Re: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other

2014-06-02 Thread Marcin Gibuła


I'll test it tomorrow. I assume you want to avoid calling
event_notifier_set() until function is reentered via aio_pool?


Yes.  But actually, I need to check if it's possible to fix
bdrv_drain_all.  If you're in coroutine context, you can defer the
draining to a safe point using a bottom half.  If you're not in
coroutine context, perhaps bdrv_drain_all has to be made illegal.  Which
means a bunch of code auditing...


For what it's worth, your solution also works fine, I couldn't recreate 
hang with it. Updated patch proposal posted earlier today.


--
mg

Re: [Qemu-devel] [PATCH v2] kvmclock: Ensure time in migration never goes backward

2014-06-02 Thread Marcin Gibuła


+cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
+
+delta = migration_tsc - time.tsc_timestamp;


Hi,

when I was testing live storage migration with libvirt I found out that 
this patch can cause virtual machine to hang when completing mirror job.


This is (probably) because kvmclock_current_nsec() is called twice in a 
row and on second call time.tsc_timestamp is larger than migration_tsc. 
This causes delta to be huge and sets timer to invalid value.


The double call happens when switching from old to new disk (pivoting in 
libvirt's nomenclature).


Example values:

First call: migration_tsc: 12052203518652476, time_tsc: 
12052203301565676, delta 108543400


Second call: migration_tsc: 12052203518652476, time_tsc: 
12052204478600322, delta 9223372036374801885


Perhaps it is worth adding:

if (time.tsc_timestamp > migration_tsc) {
return 0;
}

there? Untested though...

--
mg

Re: [Qemu-devel] [PATCH v2] kvmclock: Ensure time in migration never goes backward

2014-06-03 Thread Marcin Gibuła


Can you give this patch a try?  Should read the guest TSC values after
stopping the VM.


Yes, this patch fixes that.

Thanks,

--
mg

Re: [Qemu-devel] [PATCH v2] thread-pool: fix deadlock when callbacks depends on each other

2014-06-04 Thread Marcin Gibuła


On 04.06.2014 12:01, Stefan Hajnoczi wrote:

On Mon, Jun 02, 2014 at 09:15:27AM +0200, Marcin Gibuła wrote:

When two coroutines submit I/O and first coroutine depends on second to
complete (by calling bdrv_drain_all), deadlock may occur.


bdrv_drain_all() is a very heavy-weight operation.  Coroutines should
avoid it if possible.  Please post the file/line/function where this
call was made, perhaps there is a better way to wait for the other
coroutine.  This isn't a fix for this bug but it's a cleanup.


As in original bug report:

#4  0x7f699c095c0a in bdrv_drain_all () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805
#5  0x7f699c09c87e in bdrv_close (bs=bs@entry=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695
#6  0x7f699c09c5fa in bdrv_delete (bs=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978
#7  bdrv_unref (bs=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198
#8  0x7f699c09c812 in bdrv_drop_intermediate 
(active=active@entry=0x7f699ebfd330, top=top@entry=0x7f699f0bc520, 
base=base@entry=0x7f699eec43d0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567
#9  0x7f699c0a1963 in commit_run (opaque=0x7f699f17dcc0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144
#10 0x7f699c0e0dca in coroutine_trampoline (i0=, 
i1=) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118 




mirror_run probably has this as well. I didn't check others.

--
mg

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła


W dniu 2015-08-14 o 03:23, Li, Liang Z pisze:

On Thu, Aug 13, 2015 at 01:25:29AM +, Li, Liang Z wrote:

Hi Paolo & Marcelo,

 Could please point out what issue the patch 317b0a6d8ba44e try to fix?  I

found in live migration the cpu_synchronize_all_states will be called twice,
and it will take more than 1 ms sometimes. I try to do some optimization but
lack the knowledge about the background.

What the code in 317b0a6d8ba44e requires is to retrieve the TSC value from
the kernel.


I know 317b0a6d8ba44e is to retrieve the TSC value, but I don't understand why it is 
needed. During the live migration, the cpu_synchronize_all_states will be called 
later after stopping kvm-clock. The env->tsc will be updated, is that not enough? 
Or is there some case like call the 'stop_vm(RUN_STATE_PAUSED )' or ' 'stop_vm 
(RUN_STATE_DEBUG) ', that require updating the env->tsc? By google, I find that 
your patch try to fix some issue, but I don't know what the exact issue.


I remember testing these, and I afair that was the reason:

http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00472.html

--
mg

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła


 Thanks for your reply, I have read the thread in your email, what's the 
mean of 'switching from old to new disk', could give a detail description?


The test case was like that (using libvirt):

1. Get VM running (linux, using kvmclock),
2. Use blockcopy to copy disk data from one location to another,
3. Issue blockjob --pivot (to finish mirroring)

From what I remember, at point 3, VM is momentarily paused and resumed, 
so kvm state change handler is called twice. Without this patch, the VM 
hanged because its time goes backwards (or qemu crashed if assertion was 
not compiled out).


--
mg

Re: [Qemu-devel] about the patch kvmclock Ensure proper env->tsc value for kvmclock_current_nsec calculation

2015-08-14 Thread Marcin Gibuła


So, the problem is cause by stop_vm(RUN_STATE_PAUSED), in this case the 
env->tsc is not updated, which lead to the issue.
Is that right?


I think so.


If the cpu_clean_all_dirty() is needed just for the APIC status reason, I think 
we can do the cpu_synchronize_all_states() in do_vm_stop
and after vm_state_notify() when the RUN_STATE_PAUSED is hit, at this point all 
the device models is stopped, there is no outdated APIC status.


Yes, cpu_clean_all_dirty() was needed because without it, the second 
call to cpu_synchronize_all_states() (which is done inside 
qemu_savevm_state_complete() and after kvmclock) does nothing.



I want to write a patch to fix this issue in another way, could help to verify 
it in you environment, very appreciate if you could.


Sure, I'll test it. Both issues were quite easy to reproduce.

--
mg

Re: [Qemu-devel] [RFC 0/2] Reduce the VM downtime about 300us

2015-08-25 Thread Marcin Gibuła


W dniu 2015-08-25 o 07:52, Liang Li pisze:

This patch is for kvm live migration optimization, it fixes the issue which
commit 317b0a6d8ba tries to fix in another way, and it can reduce the live
migration VM downtime about 300us.

*This patch is not tested for the issue commit 317b0a6d8ba tries to fix*


I'll try to test it within next few days.

--
mg

69 matches

Mail list logo