Am 23.08.2011 19:14, schrieb MORITA Kazutaka: > At Tue, 23 Aug 2011 14:29:50 +0200, > Kevin Wolf wrote: >> >> Am 12.08.2011 14:33, schrieb MORITA Kazutaka: >>> This makes the sheepdog block driver support bdrv_co_readv/writev >>> instead of bdrv_aio_readv/writev. >>> >>> With this patch, Sheepdog network I/O becomes fully asynchronous. The >>> block driver yields back when send/recv returns EAGAIN, and is resumed >>> when the sheepdog network connection is ready for the operation. >>> >>> Signed-off-by: MORITA Kazutaka <morita.kazut...@lab.ntt.co.jp> >>> --- >>> block/sheepdog.c | 150 >>> +++++++++++++++++++++++++++++++++-------------------- >>> 1 files changed, 93 insertions(+), 57 deletions(-) >>> >>> diff --git a/block/sheepdog.c b/block/sheepdog.c >>> index e150ac0..e283c34 100644 >>> --- a/block/sheepdog.c >>> +++ b/block/sheepdog.c >>> @@ -274,7 +274,7 @@ struct SheepdogAIOCB { >>> int ret; >>> enum AIOCBState aiocb_type; >>> >>> - QEMUBH *bh; >>> + Coroutine *coroutine; >>> void (*aio_done_func)(SheepdogAIOCB *); >>> >>> int canceled; >>> @@ -295,6 +295,10 @@ typedef struct BDRVSheepdogState { >>> char *port; >>> int fd; >>> >>> + CoMutex lock; >>> + Coroutine *co_send; >>> + Coroutine *co_recv; >>> + >>> uint32_t aioreq_seq_num; >>> QLIST_HEAD(outstanding_aio_head, AIOReq) outstanding_aio_head; >>> } BDRVSheepdogState; >>> @@ -346,19 +350,16 @@ static const char * sd_strerror(int err) >>> /* >>> * Sheepdog I/O handling: >>> * >>> - * 1. In the sd_aio_readv/writev, read/write requests are added to the >>> - * QEMU Bottom Halves. >>> - * >>> - * 2. In sd_readv_writev_bh_cb, the callbacks of BHs, we send the I/O >>> - * requests to the server and link the requests to the >>> - * outstanding_list in the BDRVSheepdogState. we exits the >>> - * function without waiting for receiving the response. >>> + * 1. In sd_co_rw_vector, we send the I/O requests to the server and >>> + * link the requests to the outstanding_list in the >>> + * BDRVSheepdogState. The function exits without waiting for >>> + * receiving the response. >>> * >>> - * 3. We receive the response in aio_read_response, the fd handler to >>> + * 2. We receive the response in aio_read_response, the fd handler to >>> * the sheepdog connection. If metadata update is needed, we send >>> * the write request to the vdi object in sd_write_done, the write >>> - * completion function. The AIOCB callback is not called until all >>> - * the requests belonging to the AIOCB are finished. >>> + * completion function. We switch back to sd_co_readv/writev after >>> + * all the requests belonging to the AIOCB are finished. >>> */ >>> >>> static inline AIOReq *alloc_aio_req(BDRVSheepdogState *s, SheepdogAIOCB >>> *acb, >>> @@ -398,7 +399,7 @@ static inline int free_aio_req(BDRVSheepdogState *s, >>> AIOReq *aio_req) >>> static void sd_finish_aiocb(SheepdogAIOCB *acb) >>> { >>> if (!acb->canceled) { >>> - acb->common.cb(acb->common.opaque, acb->ret); >>> + qemu_coroutine_enter(acb->coroutine, NULL); >>> } >>> qemu_aio_release(acb); >>> } >>> @@ -411,7 +412,8 @@ static void sd_aio_cancel(BlockDriverAIOCB *blockacb) >>> * Sheepdog cannot cancel the requests which are already sent to >>> * the servers, so we just complete the request with -EIO here. >>> */ >>> - acb->common.cb(acb->common.opaque, -EIO); >>> + acb->ret = -EIO; >>> + qemu_coroutine_enter(acb->coroutine, NULL); >>> acb->canceled = 1; >>> } >>> >>> @@ -435,24 +437,12 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState >>> *bs, QEMUIOVector *qiov, >>> >>> acb->aio_done_func = NULL; >>> acb->canceled = 0; >>> - acb->bh = NULL; >>> + acb->coroutine = qemu_coroutine_self(); >>> acb->ret = 0; >>> QLIST_INIT(&acb->aioreq_head); >>> return acb; >>> } >>> >>> -static int sd_schedule_bh(QEMUBHFunc *cb, SheepdogAIOCB *acb) >>> -{ >>> - if (acb->bh) { >>> - error_report("bug: %d %d", acb->aiocb_type, acb->aiocb_type); >>> - return -EIO; >>> - } >>> - >>> - acb->bh = qemu_bh_new(cb, acb); >>> - qemu_bh_schedule(acb->bh); >>> - return 0; >>> -} >>> - >>> #ifdef _WIN32 >>> >>> struct msghdr { >>> @@ -635,7 +625,13 @@ static int do_readv_writev(int sockfd, struct iovec >>> *iov, int len, >>> again: >>> ret = do_send_recv(sockfd, iov, len, iov_offset, write); >>> if (ret < 0) { >>> - if (errno == EINTR || errno == EAGAIN) { >>> + if (errno == EINTR) { >>> + goto again; >>> + } >>> + if (errno == EAGAIN) { >>> + if (qemu_in_coroutine()) { >>> + qemu_coroutine_yield(); >>> + } >> >> Who reenters the coroutine if we yield here? > > The fd handlers, co_write_request() and co_read_response(), will call > qemu_coroutine_enter() for the coroutine. It will be restarted after > the sheepdog network connection becomes ready.
Makes sense. Applied to the block branch, >> And of course for a coroutine based block driver I would like to see >> much less code in callback functions. But it's a good start anyway. > > Yes, actually they can be much simpler. This patch adds asynchrous > I/O support with a minimal change based on a coroutine, and I think > code simplification would be the next step. Ok, if you're looking into this, perfect. :-) Kevin