On 07/02/2015 10:51, w00214312 wrote: > From: Bin Wu <wu.wu...@huawei.com> > > When we test the drive_mirror between different hosts by ndb devices, > we find that, during the cancel phase the qemu process crashes sometimes. > By checking the crash core file, we find the stack as follows, which means > a coroutine re-enter error occurs:
This bug probably can be fixed simply by delaying the setting of recv_coroutine. What are the symptoms if you only apply your "qemu-coroutine-lock: fix co_queue multi-adding bug" patch but not "qemu-coroutine: fix qemu_co_queue_run_restart error"? Can you try the patch below? (Compile-tested only). diff --git a/block/nbd-client.c b/block/nbd-client.c index 6e1c97c..23d6a71 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -104,10 +104,21 @@ static int nbd_co_send_request(NbdClientSession *s, QEMUIOVector *qiov, int offset) { AioContext *aio_context; - int rc, ret; + int rc, ret, i; qemu_co_mutex_lock(&s->send_mutex); + + for (i = 0; i < MAX_NBD_REQUESTS; i++) { + if (s->recv_coroutine[i] == NULL) { + s->recv_coroutine[i] = qemu_coroutine_self(); + break; + } + } + + assert(i < MAX_NBD_REQUESTS); + request->handle = INDEX_TO_HANDLE(s, i); s->send_coroutine = qemu_coroutine_self(); + aio_context = bdrv_get_aio_context(s->bs); aio_set_fd_handler(aio_context, s->sock, nbd_reply_ready, nbd_restart_write, s); @@ -164,8 +175,6 @@ static void nbd_co_receive_reply(NbdClientSession *s, static void nbd_coroutine_start(NbdClientSession *s, struct nbd_request *request) { - int i; - /* Poor man semaphore. The free_sema is locked when no other request * can be accepted, and unlocked after receiving one reply. */ if (s->in_flight >= MAX_NBD_REQUESTS - 1) { @@ -174,15 +183,7 @@ static void nbd_coroutine_start(NbdClientSession *s, } s->in_flight++; - for (i = 0; i < MAX_NBD_REQUESTS; i++) { - if (s->recv_coroutine[i] == NULL) { - s->recv_coroutine[i] = qemu_coroutine_self(); - break; - } - } - - assert(i < MAX_NBD_REQUESTS); - request->handle = INDEX_TO_HANDLE(s, i); + /* s->recv_coroutine[i] is set as soon as we get the send_lock. */ } static void nbd_coroutine_end(NbdClientSession *s,