On 08/08/2017 10:26 AM, Vladimir Sementsov-Ogievskiy wrote: Initial review, I'm still playing with this one, and will reply again on Monday.
> Do not communicate after the first error to avoid communicating throught s/throught/through a/ > broken channel. The only exclusion is try to send NBD_CMD_DISC anyway on > in nbd_client_close. I think the exclusion is wrong - if we detected the server sending us garbage, we're probably better off disconnecting entirely rather than trying to assume that the server will give us a clean disconnect via NBD_CMD_DISC. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > --- > > Hi all. Here is a patch, fixing a problem noted in > [PATCH 06/17] block/nbd-client: fix nbd_read_reply_entry > and > [PATCH 17/17] block/nbd-client: always return EIO on and after the first io > channel error > and discussed on list. > > If it will be applied to 2.10, then I'll rebase my 'nbd client refactoring > and fixing' > on it (for 2.11). If not - I'll prefer not rebase the series, so, do not > apply this > patch for 2.11. > > v2: set eio_to_all in nbd_co_send_request and nbd_co_receive_reply in case of > error > > block/nbd-client.h | 1 + > block/nbd-client.c | 65 > +++++++++++++++++++++++++++++++++++++++--------------- > 2 files changed, 48 insertions(+), 18 deletions(-) > > diff --git a/block/nbd-client.h b/block/nbd-client.h > index df80771357..28db9922c8 100644 > --- a/block/nbd-client.h > +++ b/block/nbd-client.h > @@ -29,6 +29,7 @@ typedef struct NBDClientSession { > > Coroutine *recv_coroutine[MAX_NBD_REQUESTS]; > NBDReply reply; > + bool eio_to_all; Bikeshedding - I'm wondering if there is a better name; maybe 'error' or 'server_broken'? > } NBDClientSession; > > NBDClientSession *nbd_get_client_session(BlockDriverState *bs); > diff --git a/block/nbd-client.c b/block/nbd-client.c > index 25dd28406b..a72cb7690a 100644 > --- a/block/nbd-client.c > +++ b/block/nbd-client.c > @@ -49,6 +49,8 @@ static void nbd_teardown_connection(BlockDriverState *bs) > { > NBDClientSession *client = nbd_get_client_session(bs); > > + client->eio_to_all = true; > + Okay, if you set it here, then it does NOT mean 'server_broken' - and if we reached this spot normally, we still WANT the NBD_CMD_DISC. So do we really need to set it here? > if (!client->ioc) { /* Already closed */ > return; > } > @@ -74,12 +76,16 @@ static coroutine_fn void nbd_read_reply_entry(void > *opaque) > Error *local_err = NULL; > > for (;;) { > + if (s->eio_to_all) { > + break; > + } > + > assert(s->reply.handle == 0); > ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); > if (ret < 0) { > error_report_err(local_err); > } > - if (ret <= 0) { > + if (ret <= 0 || s->eio_to_all) { I'm still wondering if we need this condition at two points in the loop, or if merely checking at the beginning of the cycle is sufficient (like I said in my counterproposal thread, I'll be hammering away at your patch under gdb over the weekend, to see if I can trigger all the additions under some situation). > break; > } > > @@ -107,6 +113,7 @@ static coroutine_fn void nbd_read_reply_entry(void > *opaque) > qemu_coroutine_yield(); > } > > + s->eio_to_all = true; I think this one should be guarded by 'if (ret < 0)' - we also break out of the for loop when ret == 0 (aka EOF because the server ended cleanly), which is not the same as the server sending us garbage. > nbd_recv_coroutines_enter_all(s); > s->read_reply_co = NULL; > } > @@ -118,6 +125,10 @@ static int nbd_co_send_request(BlockDriverState *bs, > NBDClientSession *s = nbd_get_client_session(bs); > int rc, ret, i; > > + if (s->eio_to_all) { > + return -EIO; > + } > + This one is good. > qemu_co_mutex_lock(&s->send_mutex); > while (s->in_flight == MAX_NBD_REQUESTS) { > qemu_co_queue_wait(&s->free_sema, &s->send_mutex); > @@ -135,15 +146,15 @@ static int nbd_co_send_request(BlockDriverState *bs, > assert(i < MAX_NBD_REQUESTS); > request->handle = INDEX_TO_HANDLE(s, i); > > - if (!s->ioc) { > + if (s->eio_to_all) { > qemu_co_mutex_unlock(&s->send_mutex); > - return -EPIPE; > + return -EIO; > } I don't know if changing the errno is wise; maybe we want to keep both conditions (the !s->ioc and the s->eio_to_all) - especially if we only set eio_to_all on an actual detection of server garbage rather than on all exit paths. > > if (qiov) { > qio_channel_set_cork(s->ioc, true); > rc = nbd_send_request(s->ioc, request); > - if (rc >= 0) { > + if (rc >= 0 && !s->eio_to_all) { > ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, false, > NULL); > if (ret != request->len) { > @@ -155,7 +166,13 @@ static int nbd_co_send_request(BlockDriverState *bs, > rc = nbd_send_request(s->ioc, request); > } > qemu_co_mutex_unlock(&s->send_mutex); > - return rc; > + > + if (rc < 0 || s->eio_to_all) { > + s->eio_to_all = true; > + return -EIO; I think this one makes sense. > + } > + > + return 0; > } > > static void nbd_co_receive_reply(NBDClientSession *s, > @@ -169,14 +186,16 @@ static void nbd_co_receive_reply(NBDClientSession *s, > qemu_coroutine_yield(); > *reply = s->reply; > if (reply->handle != request->handle || > - !s->ioc) { > + !s->ioc || s->eio_to_all) { > reply->error = EIO; > + s->eio_to_all = true; > } else { > if (qiov && reply->error == 0) { > ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, true, > NULL); > - if (ret != request->len) { > + if (ret != request->len || s->eio_to_all) { > reply->error = EIO; > + s->eio_to_all = true; > } This may be redundant with setting eio_to_all in nbd_read_reply_entry. > } > > @@ -225,8 +244,10 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t > offset, > } else { > nbd_co_receive_reply(client, &request, &reply, qiov); > } > - nbd_coroutine_end(bs, &request); > - return -reply.error; > + if (request.handle != 0) { > + nbd_coroutine_end(bs, &request); > + } I'm not sure about this one - don't we always need to call nbd_coroutine_end for resource cleanup, even when we detected an error? > @@ -384,6 +412,7 @@ int nbd_client_init(BlockDriverState *bs, > logout("session init %s\n", export); > qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL); > > + client->eio_to_all = false; This one happens by default since we zero-initialize, but explicit initialization doesn't hurt. > client->info.request_sizes = true; > ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export, > tlscreds, hostname, > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature