Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-30 Thread Marcin Gibuła
1. Debug bdrv_drain_all() and find out whether there are any I/O requests remaining. I believe that's what happens: Context 1: - commit_one_iteration makes write request (req A) - request A is handled to io thread, qemu_coroutine_yield() is called Context 2: - VM makes write request (req B

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-29 Thread Marcin Gibuła
Please try disabling I/O limits on the drive and try again. Is there anything else I could try? I've captured trace of hanged VM with following events traced: bdrv_* paio_* thread_pool_* commit_* qcow2_* and debug code that prints requests from traced_requests in bdrv_requests_pending funct

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Marcin Gibuła
/usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536 -realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -no-user-config -nodefaults -chardev socket,id=cha

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Stefan Hajnoczi
On Wed, May 28, 2014 at 3:36 PM, Marcin Gibuła wrote: >> Can you post the QEMU command-line so we know the precise VM >> configuration? (ps aux | grep qemu) > > > /usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S > -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Marcin Gibuła
What happens if you omit #7 virDomainGetBlockJobInfo()? Does it still hang 1/10 times? Yes, it still hangs. Can you post the QEMU command-line so we know the precise VM configuration? (ps aux | grep qemu) /usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S -machine p

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-28 Thread Stefan Hajnoczi
On Mon, May 26, 2014 at 02:58:35PM +0200, Marcin Gibuła wrote: > >Two options for making progress on this bug: > > > >1. Debug bdrv_drain_all() and find out whether there are any I/O > >requests remaining. > > Yes, there is one request pending on active layer of disk that is being > commited (

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-26 Thread Marcin Gibuła
Two options for making progress on this bug: 1. Debug bdrv_drain_all() and find out whether there are any I/O requests remaining. Yes, there is one request pending on active layer of disk that is being commited (on bs->tracked_requests list). IO threads die off because they have nothing t

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-26 Thread Stefan Hajnoczi
On Fri, May 23, 2014 at 06:25:31PM +0200, Marcin Gibuła wrote: > >If you see a pending request on a RADOS block device (rbd) then it would > >be good to dig deeper into QEMU's block/rbd.c driver to see why it's not > >completing that request. > > > >Are you using qcow2 on top of rbd? > > Hi, > I'v

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
W dniu 2014-05-23 15:14, Marcin Gibuła pisze: bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the function that determines for each of the disks in your VM if it still has requests in flight that need to be completed. This function must have returned true even though there is n

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
If you see a pending request on a RADOS block device (rbd) then it would be good to dig deeper into QEMU's block/rbd.c driver to see why it's not completing that request. Are you using qcow2 on top of rbd? Hi, I've already recreated this without rbd and with stock qemu 2.0. -- mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Stefan Hajnoczi
On Thu, May 22, 2014 at 10:49:18PM +0200, Marcin Gibuła wrote: > This is backtrace of qemu process: > > (gdb) thread apply all backtrace [...] a bunch of rbd threads, vnc worker thread, QEMU worker threads > Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)): > #0 0x7f6998020286 in ppoll () from

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
The condition that is true is: if (!QLIST_EMPTY(&bs->tracked_requests)) and it's returned for intermediate qcow2 which is being commited. Btw - it's also disk that is being pounded with writes during commit. -- mg

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the function that determines for each of the disks in your VM if it still has requests in flight that need to be completed. This function must have returned true even though there is nothing to wait for. Can you check which of its

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
I see that you have a mix of aio=native and aio=threads. I can't say much about the aio=native disks (perhaps try to reproduce without them?), but there are definitely no worker threads for the other disks that bdrv_drain_all() would have to wait for. True. But I/O was being done only qcow2 disk

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Kevin Wolf
Am 23.05.2014 um 11:25 hat Marcin Gibuła geschrieben: > On 23.05.2014 10:19, Paolo Bonzini wrote: > >Il 22/05/2014 23:05, Marcin Gibuła ha scritto: > >>Some more info. > >>VM was doing lot of write IO during this test. > > > >QEMU is waiting for librados to complete I/O. Can you reproduce it with

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
On 23.05.2014 10:19, Paolo Bonzini wrote: Il 22/05/2014 23:05, Marcin Gibuła ha scritto: Some more info. VM was doing lot of write IO during this test. QEMU is waiting for librados to complete I/O. Can you reproduce it with a different driver? Hi, I've reproduced it without RBD. Backtrace

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Marcin Gibuła
On 23.05.2014 10:19, Paolo Bonzini wrote: Il 22/05/2014 23:05, Marcin Gibuła ha scritto: Some more info. VM was doing lot of write IO during this test. QEMU is waiting for librados to complete I/O. Can you reproduce it with a different driver? I'll try. However RBD is used only as read-on

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-23 Thread Paolo Bonzini
Il 22/05/2014 23:05, Marcin Gibuła ha scritto: Some more info. VM was doing lot of write IO during this test. QEMU is waiting for librados to complete I/O. Can you reproduce it with a different driver? Paolo ppoll() is listening for these descriptors (from strace): ppoll([{fd=25, events=

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła
I've encountered deadlock in qemu during some stress testing. The test is making snapshots, committing them and constantly quering for block job info. What is the exact command you used for triggering the block-commit? Was it via direct HMP or QMP, or indirect via libvirt? Via libvirt. Were

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Eric Blake
On 05/22/2014 02:49 PM, Marcin Gibuła wrote: > Hi, > > I've encountered deadlock in qemu during some stress testing. The test > is making snapshots, committing them and constantly quering for block > job info. What is the exact command you used for triggering the block-commit? Was it via direct

Re: [Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła
W dniu 2014-05-22 22:49, Marcin Gibuła pisze: Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)): #0 0x7f6998020286 in ppoll () from /lib64/libc.so.6 #1 0x7f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=)

[Qemu-devel] qemu 2.0, deadlock in block-commit

2014-05-22 Thread Marcin Gibuła
Hi, I've encountered deadlock in qemu during some stress testing. The test is making snapshots, committing them and constantly quering for block job info. The version of QEMU is 2.0.0 rc3 (backtrace below says rc2, but it's manualy patched to rc3), but there seems to be no changes in block l