Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend

Bharata B Rao Wed, 05 Sep 2012 03:42:19 -0700

On Wed, Sep 05, 2012 at 12:01:58PM +0200, Kevin Wolf wrote:
> Am 05.09.2012 09:41, schrieb Bharata B Rao:
> > On Thu, Aug 09, 2012 at 06:32:16PM +0530, Bharata B Rao wrote:
> >> +static void qemu_gluster_complete_aio(GlusterAIOCB *acb)
> >> +{
> >> +    int ret;
> >> +
> >> +    if (acb->canceled) {
> >> +        qemu_aio_release(acb);
> >> +        return;
> >> +    }
> >> +
> >> +    if (acb->ret == acb->size) {
> >> +        ret = 0; /* Success */
> >> +    } else if (acb->ret < 0) {
> >> +        ret = acb->ret; /* Read/Write failed */
> >> +    } else {
> >> +        ret = -EIO; /* Partial read/write - fail it */
> >> +    }
> >> +    acb->common.cb(acb->common.opaque, ret);
> > 
> > The .cb() here is bdrv_co_io_em_complete(). It does qemu_coroutine_enter(),
> > handles the return value and comes back here.
> 
> Right.
> 
> .cb is set by qemu_gluster_aio_rw/flush(), and the only way these can be
> called is through bdrv_co_io_em() and bdrv_co_flush(), which both set
> bdrv_co_io_em_complete as the callback.


Right.

> 
> > But if the bdrv_read or bdrv_write or bdrv_flush was called from a
> > coroutine context (as against they themselves creating a new coroutine),
> > the above .cb() call above doesn't return to this point.
> 
> Why?

Note that in this particular scenario (qemu-img create -f qcow2), bdrv_read
and bdrv_write are called from the coroutine thread that is running
qcow2_create(). So bdrv_read will find itself running in coroutine context
and hence will continue to use the same coroutine thread.

    if (qemu_in_coroutine()) {
        /* Fast-path if already in coroutine context */
        bdrv_rw_co_entry(&rwco);
    }

The path taken is.

bdrv_rw_co_entry -> bdrv_co_do_readv -> bdrv_co_readv_em -> bdrv_co_io_em
-> qemu_gluster_aio_readv

bdrv_co_io_em does qemu_coroutine_yield() next.

When the AIO is completed, qemu_gluster_complete_aio() is run as the read end
of the pipe becomes ready, so I assume it is in non-coroutine context to start
with. When it does acb->common.cb(), it enters the co-routine which was yielded
by bdrv_co_io_em.

Now the read call returns back and we ultimately end up in bdrv_rw_co_entry
which takes us back to bdrv_read and back to bdrv_pwrite where all this
originated (Note that qcow2_create2 called bdrv_pwrite in the first place).

So I never come back to the next statement in qemu_gluster_complete_aio()
after acb->common.cb(acb->common.opaque, ret). So the coroutine didn't
end and continued futher by issuing another bdrv_write call.

The effect of this is seen next when qcow2_create calls bdrv_close which does
bdrv_drain_all which calls qemu_aio_wait and I never come out of it.
In qemu_aio_wait, node->io_flush(node->opaque) returns a non-zero value
always, because node->io_flush which is qemu_gluster_aio_flush_cb() returns
non zero always. This is happening since I never got a chance to decrement
s->qemu_aio_count which was supposed to happen after qemu_gluster_complete_aio
came back from .cb() call.

So this is what I think is happening, hoping that I got it right.
Note that when I schedule a BH in qemu_gluster_complete_aio(), then
things work fine apparently because I am able to continue and decrement
s->qemu_aio_count.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend

Reply via email to