On Wed, Sep 05, 2012 at 12:01:58PM +0200, Kevin Wolf wrote: > Am 05.09.2012 09:41, schrieb Bharata B Rao: > > On Thu, Aug 09, 2012 at 06:32:16PM +0530, Bharata B Rao wrote: > >> +static void qemu_gluster_complete_aio(GlusterAIOCB *acb) > >> +{ > >> + int ret; > >> + > >> + if (acb->canceled) { > >> + qemu_aio_release(acb); > >> + return; > >> + } > >> + > >> + if (acb->ret == acb->size) { > >> + ret = 0; /* Success */ > >> + } else if (acb->ret < 0) { > >> + ret = acb->ret; /* Read/Write failed */ > >> + } else { > >> + ret = -EIO; /* Partial read/write - fail it */ > >> + } > >> + acb->common.cb(acb->common.opaque, ret); > > > > The .cb() here is bdrv_co_io_em_complete(). It does qemu_coroutine_enter(), > > handles the return value and comes back here. > > Right. > > .cb is set by qemu_gluster_aio_rw/flush(), and the only way these can be > called is through bdrv_co_io_em() and bdrv_co_flush(), which both set > bdrv_co_io_em_complete as the callback.
Right. > > > But if the bdrv_read or bdrv_write or bdrv_flush was called from a > > coroutine context (as against they themselves creating a new coroutine), > > the above .cb() call above doesn't return to this point. > > Why? Note that in this particular scenario (qemu-img create -f qcow2), bdrv_read and bdrv_write are called from the coroutine thread that is running qcow2_create(). So bdrv_read will find itself running in coroutine context and hence will continue to use the same coroutine thread. if (qemu_in_coroutine()) { /* Fast-path if already in coroutine context */ bdrv_rw_co_entry(&rwco); } The path taken is. bdrv_rw_co_entry -> bdrv_co_do_readv -> bdrv_co_readv_em -> bdrv_co_io_em -> qemu_gluster_aio_readv bdrv_co_io_em does qemu_coroutine_yield() next. When the AIO is completed, qemu_gluster_complete_aio() is run as the read end of the pipe becomes ready, so I assume it is in non-coroutine context to start with. When it does acb->common.cb(), it enters the co-routine which was yielded by bdrv_co_io_em. Now the read call returns back and we ultimately end up in bdrv_rw_co_entry which takes us back to bdrv_read and back to bdrv_pwrite where all this originated (Note that qcow2_create2 called bdrv_pwrite in the first place). So I never come back to the next statement in qemu_gluster_complete_aio() after acb->common.cb(acb->common.opaque, ret). So the coroutine didn't end and continued futher by issuing another bdrv_write call. The effect of this is seen next when qcow2_create calls bdrv_close which does bdrv_drain_all which calls qemu_aio_wait and I never come out of it. In qemu_aio_wait, node->io_flush(node->opaque) returns a non-zero value always, because node->io_flush which is qemu_gluster_aio_flush_cb() returns non zero always. This is happening since I never got a chance to decrement s->qemu_aio_count which was supposed to happen after qemu_gluster_complete_aio came back from .cb() call. So this is what I think is happening, hoping that I got it right. Note that when I schedule a BH in qemu_gluster_complete_aio(), then things work fine apparently because I am able to continue and decrement s->qemu_aio_count. Regards, Bharata.