At Tue, 23 Aug 2011 14:29:50 +0200, Kevin Wolf wrote: > > Am 12.08.2011 14:33, schrieb MORITA Kazutaka: > > This makes the sheepdog block driver support bdrv_co_readv/writev > > instead of bdrv_aio_readv/writev. > > > > With this patch, Sheepdog network I/O becomes fully asynchronous. The > > block driver yields back when send/recv returns EAGAIN, and is resumed > > when the sheepdog network connection is ready for the operation. > > > > Signed-off-by: MORITA Kazutaka <morita.kazut...@lab.ntt.co.jp> > > --- > > block/sheepdog.c | 150 > > +++++++++++++++++++++++++++++++++-------------------- > > 1 files changed, 93 insertions(+), 57 deletions(-) > > > > diff --git a/block/sheepdog.c b/block/sheepdog.c > > index e150ac0..e283c34 100644 > > --- a/block/sheepdog.c > > +++ b/block/sheepdog.c > > @@ -274,7 +274,7 @@ struct SheepdogAIOCB { > > int ret; > > enum AIOCBState aiocb_type; > > > > - QEMUBH *bh; > > + Coroutine *coroutine; > > void (*aio_done_func)(SheepdogAIOCB *); > > > > int canceled; > > @@ -295,6 +295,10 @@ typedef struct BDRVSheepdogState { > > char *port; > > int fd; > > > > + CoMutex lock; > > + Coroutine *co_send; > > + Coroutine *co_recv; > > + > > uint32_t aioreq_seq_num; > > QLIST_HEAD(outstanding_aio_head, AIOReq) outstanding_aio_head; > > } BDRVSheepdogState; > > @@ -346,19 +350,16 @@ static const char * sd_strerror(int err) > > /* > > * Sheepdog I/O handling: > > * > > - * 1. In the sd_aio_readv/writev, read/write requests are added to the > > - * QEMU Bottom Halves. > > - * > > - * 2. In sd_readv_writev_bh_cb, the callbacks of BHs, we send the I/O > > - * requests to the server and link the requests to the > > - * outstanding_list in the BDRVSheepdogState. we exits the > > - * function without waiting for receiving the response. > > + * 1. In sd_co_rw_vector, we send the I/O requests to the server and > > + * link the requests to the outstanding_list in the > > + * BDRVSheepdogState. The function exits without waiting for > > + * receiving the response. > > * > > - * 3. We receive the response in aio_read_response, the fd handler to > > + * 2. We receive the response in aio_read_response, the fd handler to > > * the sheepdog connection. If metadata update is needed, we send > > * the write request to the vdi object in sd_write_done, the write > > - * completion function. The AIOCB callback is not called until all > > - * the requests belonging to the AIOCB are finished. > > + * completion function. We switch back to sd_co_readv/writev after > > + * all the requests belonging to the AIOCB are finished. > > */ > > > > static inline AIOReq *alloc_aio_req(BDRVSheepdogState *s, SheepdogAIOCB > > *acb, > > @@ -398,7 +399,7 @@ static inline int free_aio_req(BDRVSheepdogState *s, > > AIOReq *aio_req) > > static void sd_finish_aiocb(SheepdogAIOCB *acb) > > { > > if (!acb->canceled) { > > - acb->common.cb(acb->common.opaque, acb->ret); > > + qemu_coroutine_enter(acb->coroutine, NULL); > > } > > qemu_aio_release(acb); > > } > > @@ -411,7 +412,8 @@ static void sd_aio_cancel(BlockDriverAIOCB *blockacb) > > * Sheepdog cannot cancel the requests which are already sent to > > * the servers, so we just complete the request with -EIO here. > > */ > > - acb->common.cb(acb->common.opaque, -EIO); > > + acb->ret = -EIO; > > + qemu_coroutine_enter(acb->coroutine, NULL); > > acb->canceled = 1; > > } > > > > @@ -435,24 +437,12 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState > > *bs, QEMUIOVector *qiov, > > > > acb->aio_done_func = NULL; > > acb->canceled = 0; > > - acb->bh = NULL; > > + acb->coroutine = qemu_coroutine_self(); > > acb->ret = 0; > > QLIST_INIT(&acb->aioreq_head); > > return acb; > > } > > > > -static int sd_schedule_bh(QEMUBHFunc *cb, SheepdogAIOCB *acb) > > -{ > > - if (acb->bh) { > > - error_report("bug: %d %d", acb->aiocb_type, acb->aiocb_type); > > - return -EIO; > > - } > > - > > - acb->bh = qemu_bh_new(cb, acb); > > - qemu_bh_schedule(acb->bh); > > - return 0; > > -} > > - > > #ifdef _WIN32 > > > > struct msghdr { > > @@ -635,7 +625,13 @@ static int do_readv_writev(int sockfd, struct iovec > > *iov, int len, > > again: > > ret = do_send_recv(sockfd, iov, len, iov_offset, write); > > if (ret < 0) { > > - if (errno == EINTR || errno == EAGAIN) { > > + if (errno == EINTR) { > > + goto again; > > + } > > + if (errno == EAGAIN) { > > + if (qemu_in_coroutine()) { > > + qemu_coroutine_yield(); > > + } > > Who reenters the coroutine if we yield here?
The fd handlers, co_write_request() and co_read_response(), will call qemu_coroutine_enter() for the coroutine. It will be restarted after the sheepdog network connection becomes ready. > > And of course for a coroutine based block driver I would like to see > much less code in callback functions. But it's a good start anyway. Yes, actually they can be much simpler. This patch adds asynchrous I/O support with a minimal change based on a coroutine, and I think code simplification would be the next step. Thanks, Kazutaka