Re: [Qemu-devel] [PATCH v2 for 2.10] block/nbd-client: always return EIO on and after the first io channel error

Eric Blake Fri, 11 Aug 2017 15:28:35 -0700

On 08/08/2017 10:26 AM, Vladimir Sementsov-Ogievskiy wrote:

Initial review, I'm still playing with this one, and will reply again on
Monday.


> Do not communicate after the first error to avoid communicating throught

s/throught/through a/

> broken channel. The only exclusion is try to send NBD_CMD_DISC anyway on
> in nbd_client_close.

I think the exclusion is wrong - if we detected the server sending us
garbage, we're probably better off disconnecting entirely rather than
trying to assume that the server will give us a clean disconnect via
NBD_CMD_DISC.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
> ---
> 
> Hi all. Here is a patch, fixing a problem noted in
> [PATCH 06/17] block/nbd-client: fix nbd_read_reply_entry
> and
> [PATCH 17/17] block/nbd-client: always return EIO on and after the first io 
> channel error
> and discussed on list.
> 
> If it will be applied to 2.10, then I'll rebase my 'nbd client refactoring 
> and fixing'
> on it (for 2.11). If not - I'll prefer not rebase the series, so, do not 
> apply this
> patch for 2.11.
> 
> v2: set eio_to_all in nbd_co_send_request and nbd_co_receive_reply in case of 
> error
> 
>  block/nbd-client.h |  1 +
>  block/nbd-client.c | 65 
> +++++++++++++++++++++++++++++++++++++++---------------
>  2 files changed, 48 insertions(+), 18 deletions(-)
> 
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index df80771357..28db9922c8 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -29,6 +29,7 @@ typedef struct NBDClientSession {
>  
>      Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
>      NBDReply reply;
> +    bool eio_to_all;

Bikeshedding - I'm wondering if there is a better name; maybe 'error' or
'server_broken'?

>  } NBDClientSession;
>  
>  NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 25dd28406b..a72cb7690a 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -49,6 +49,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>  {
>      NBDClientSession *client = nbd_get_client_session(bs);
>  
> +    client->eio_to_all = true;
> +

Okay, if you set it here, then it does NOT mean 'server_broken' - and if
we reached this spot normally, we still WANT the NBD_CMD_DISC.  So do we
really need to set it here?

>      if (!client->ioc) { /* Already closed */
>          return;
>      }
> @@ -74,12 +76,16 @@ static coroutine_fn void nbd_read_reply_entry(void 
> *opaque)
>      Error *local_err = NULL;
>  
>      for (;;) {
> +        if (s->eio_to_all) {
> +            break;
> +        }
> +
>          assert(s->reply.handle == 0);
>          ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>          if (ret < 0) {
>              error_report_err(local_err);
>          }
> -        if (ret <= 0) {
> +        if (ret <= 0 || s->eio_to_all) {

I'm still wondering if we need this condition at two points in the loop,
or if merely checking at the beginning of the cycle is sufficient (like
I said in my counterproposal thread, I'll be hammering away at your
patch under gdb over the weekend, to see if I can trigger all the
additions under some situation).

>              break;
>          }
>  
> @@ -107,6 +113,7 @@ static coroutine_fn void nbd_read_reply_entry(void 
> *opaque)
>          qemu_coroutine_yield();
>      }
>  
> +    s->eio_to_all = true;

I think this one should be guarded by 'if (ret < 0)' - we also break out
of the for loop when ret == 0 (aka EOF because the server ended
cleanly), which is not the same as the server sending us garbage.

>      nbd_recv_coroutines_enter_all(s);
>      s->read_reply_co = NULL;
>  }
> @@ -118,6 +125,10 @@ static int nbd_co_send_request(BlockDriverState *bs,
>      NBDClientSession *s = nbd_get_client_session(bs);
>      int rc, ret, i;
>  
> +    if (s->eio_to_all) {
> +        return -EIO;
> +    }
> +

This one is good.

>      qemu_co_mutex_lock(&s->send_mutex);
>      while (s->in_flight == MAX_NBD_REQUESTS) {
>          qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
> @@ -135,15 +146,15 @@ static int nbd_co_send_request(BlockDriverState *bs,
>      assert(i < MAX_NBD_REQUESTS);
>      request->handle = INDEX_TO_HANDLE(s, i);
>  
> -    if (!s->ioc) {
> +    if (s->eio_to_all) {
>          qemu_co_mutex_unlock(&s->send_mutex);
> -        return -EPIPE;
> +        return -EIO;
>      }

I don't know if changing the errno is wise; maybe we want to keep both
conditions (the !s->ioc and the s->eio_to_all) - especially if we only
set eio_to_all on an actual detection of server garbage rather than on
all exit paths.

>  
>      if (qiov) {
>          qio_channel_set_cork(s->ioc, true);
>          rc = nbd_send_request(s->ioc, request);
> -        if (rc >= 0) {
> +        if (rc >= 0 && !s->eio_to_all) {
>              ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, false,
>                            NULL);
>              if (ret != request->len) {
> @@ -155,7 +166,13 @@ static int nbd_co_send_request(BlockDriverState *bs,
>          rc = nbd_send_request(s->ioc, request);
>      }
>      qemu_co_mutex_unlock(&s->send_mutex);
> -    return rc;
> +
> +    if (rc < 0 || s->eio_to_all) {
> +        s->eio_to_all = true;
> +        return -EIO;

I think this one makes sense.

> +    }
> +
> +    return 0;
>  }
>  
>  static void nbd_co_receive_reply(NBDClientSession *s,
> @@ -169,14 +186,16 @@ static void nbd_co_receive_reply(NBDClientSession *s,
>      qemu_coroutine_yield();
>      *reply = s->reply;
>      if (reply->handle != request->handle ||
> -        !s->ioc) {
> +        !s->ioc || s->eio_to_all) {
>          reply->error = EIO;
> +        s->eio_to_all = true;
>      } else {
>          if (qiov && reply->error == 0) {
>              ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, true,
>                            NULL);
> -            if (ret != request->len) {
> +            if (ret != request->len || s->eio_to_all) {
>                  reply->error = EIO;
> +                s->eio_to_all = true;
>              }

This may be redundant with setting eio_to_all in nbd_read_reply_entry.

>          }
>  
> @@ -225,8 +244,10 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t 
> offset,
>      } else {
>          nbd_co_receive_reply(client, &request, &reply, qiov);
>      }
> -    nbd_coroutine_end(bs, &request);
> -    return -reply.error;
> +    if (request.handle != 0) {
> +        nbd_coroutine_end(bs, &request);
> +    }

I'm not sure about this one - don't we always need to call
nbd_coroutine_end for resource cleanup, even when we detected an error?

> @@ -384,6 +412,7 @@ int nbd_client_init(BlockDriverState *bs,
>      logout("session init %s\n", export);
>      qio_channel_set_blocking(QIO_CHANNEL(sioc), true, NULL);
>  
> +    client->eio_to_all = false;

This one happens by default since we zero-initialize, but explicit
initialization doesn't hurt.

>      client->info.request_sizes = true;
>      ret = nbd_receive_negotiate(QIO_CHANNEL(sioc), export,
>                                  tlscreds, hostname,
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 for 2.10] block/nbd-client: always return EIO on and after the first io channel error

Reply via email to