1. Debug bdrv_drain_all() and find out whether there are any I/O
requests remaining.
I believe that's what happens:
Context 1:
- commit_one_iteration makes write request (req A)
- request A is handled to io thread, qemu_coroutine_yield() is called
Context 2:
- VM makes write request (req B
Please try disabling I/O limits on the drive and try again.
Is there anything else I could try?
I've captured trace of hanged VM with following events traced:
bdrv_*
paio_*
thread_pool_*
commit_*
qcow2_*
and debug code that prints requests from traced_requests in
bdrv_requests_pending funct
/usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S
-machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,-kvmclock -m 1536
-realtime mlock=on -smp 2,sockets=2,cores=10,threads=1 -uuid
68189c3c-02f6-4aae-88a2-5f13c5e6f53a -no-user-config -nodefaults -chardev
socket,id=cha
On Wed, May 28, 2014 at 3:36 PM, Marcin Gibuła wrote:
>> Can you post the QEMU command-line so we know the precise VM
>> configuration? (ps aux | grep qemu)
>
>
> /usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a -S
> -machine pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,
What happens if you omit #7 virDomainGetBlockJobInfo()? Does it still
hang 1/10 times?
Yes, it still hangs.
Can you post the QEMU command-line so we know the precise VM
configuration? (ps aux | grep qemu)
/usr/bin/qemu-system-x86_64 -name 68189c3c-02f6-4aae-88a2-5f13c5e6f53a
-S -machine p
On Mon, May 26, 2014 at 02:58:35PM +0200, Marcin Gibuła wrote:
> >Two options for making progress on this bug:
> >
> >1. Debug bdrv_drain_all() and find out whether there are any I/O
> >requests remaining.
>
> Yes, there is one request pending on active layer of disk that is being
> commited (
Two options for making progress on this bug:
1. Debug bdrv_drain_all() and find out whether there are any I/O
requests remaining.
Yes, there is one request pending on active layer of disk that is being
commited (on bs->tracked_requests list). IO threads die off because they
have nothing t
On Fri, May 23, 2014 at 06:25:31PM +0200, Marcin Gibuła wrote:
> >If you see a pending request on a RADOS block device (rbd) then it would
> >be good to dig deeper into QEMU's block/rbd.c driver to see why it's not
> >completing that request.
> >
> >Are you using qcow2 on top of rbd?
>
> Hi,
> I'v
W dniu 2014-05-23 15:14, Marcin Gibuła pisze:
bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
function that determines for each of the disks in your VM if it still
has requests in flight that need to be completed. This function must
have returned true even though there is n
If you see a pending request on a RADOS block device (rbd) then it would
be good to dig deeper into QEMU's block/rbd.c driver to see why it's not
completing that request.
Are you using qcow2 on top of rbd?
Hi,
I've already recreated this without rbd and with stock qemu 2.0.
--
mg
On Thu, May 22, 2014 at 10:49:18PM +0200, Marcin Gibuła wrote:
> This is backtrace of qemu process:
>
> (gdb) thread apply all backtrace
[...] a bunch of rbd threads, vnc worker thread, QEMU worker threads
> Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)):
> #0 0x7f6998020286 in ppoll () from
The condition that is true is:
if (!QLIST_EMPTY(&bs->tracked_requests))
and it's returned for intermediate qcow2 which is being commited.
Btw - it's also disk that is being pounded with writes during commit.
--
mg
bdrv_requests_pending(), called by bdrv_requests_pending_all(), is the
function that determines for each of the disks in your VM if it still
has requests in flight that need to be completed. This function must
have returned true even though there is nothing to wait for.
Can you check which of its
I see that you have a mix of aio=native and aio=threads. I can't say
much about the aio=native disks (perhaps try to reproduce without
them?), but there are definitely no worker threads for the other disks
that bdrv_drain_all() would have to wait for.
True. But I/O was being done only qcow2 disk
Am 23.05.2014 um 11:25 hat Marcin Gibuła geschrieben:
> On 23.05.2014 10:19, Paolo Bonzini wrote:
> >Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
> >>Some more info.
> >>VM was doing lot of write IO during this test.
> >
> >QEMU is waiting for librados to complete I/O. Can you reproduce it with
On 23.05.2014 10:19, Paolo Bonzini wrote:
Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
Some more info.
VM was doing lot of write IO during this test.
QEMU is waiting for librados to complete I/O. Can you reproduce it with
a different driver?
Hi,
I've reproduced it without RBD. Backtrace
On 23.05.2014 10:19, Paolo Bonzini wrote:
Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
Some more info.
VM was doing lot of write IO during this test.
QEMU is waiting for librados to complete I/O. Can you reproduce it with
a different driver?
I'll try.
However RBD is used only as read-on
Il 22/05/2014 23:05, Marcin Gibuła ha scritto:
Some more info.
VM was doing lot of write IO during this test.
QEMU is waiting for librados to complete I/O. Can you reproduce it with
a different driver?
Paolo
ppoll() is listening for these descriptors (from strace):
ppoll([{fd=25, events=
I've encountered deadlock in qemu during some stress testing. The test
is making snapshots, committing them and constantly quering for block
job info.
What is the exact command you used for triggering the block-commit? Was
it via direct HMP or QMP, or indirect via libvirt?
Via libvirt.
Were
On 05/22/2014 02:49 PM, Marcin Gibuła wrote:
> Hi,
>
> I've encountered deadlock in qemu during some stress testing. The test
> is making snapshots, committing them and constantly quering for block
> job info.
What is the exact command you used for triggering the block-commit? Was
it via direct
W dniu 2014-05-22 22:49, Marcin Gibuła pisze:
Thread 1 (Thread 0x7f699bfcd900 (LWP 13647)):
#0 0x7f6998020286 in ppoll () from /lib64/libc.so.6
#1 0x7f699c1f3d9b in ppoll (__ss=0x0, __timeout=0x0,
__nfds=, __fds=) at
/usr/include/bits/poll2.h:77
#2 qemu_poll_ns (fds=, nfds=,
timeout=)
Hi,
I've encountered deadlock in qemu during some stress testing. The test
is making snapshots, committing them and constantly quering for block
job info.
The version of QEMU is 2.0.0 rc3 (backtrace below says rc2, but it's
manualy patched to rc3), but there seems to be no changes in block l
22 matches
Mail list logo