Re: [Qemu-devel] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

Eric Blake Fri, 16 Feb 2018 05:37:27 -0800

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:

Minimal realization: only one extent in server answer is supported.


Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---


continuing where I left off,

+++ b/nbd/common.c
@@ -75,6 +75,10 @@ const char *nbd_opt_lookup(uint32_t opt)
          return "go";
      case NBD_OPT_STRUCTURED_REPLY:
          return "structured reply";
+    case NBD_OPT_LIST_META_CONTEXT:
+        return "list meta context";
+    case NBD_OPT_SET_META_CONTEXT:
+        return "set meta context";
      default:

Should the changes to the header for new macros and to common.c formapping bits to names be split into a separate patch, so that someonecould backport just the new constants and then the client-sideimplementation, rather than being forced to backport the rest of theserver implementation at the same time?

+
+/* nbd_read_size_string
+ *
+ * Read string in format
+ *   uint32_t len
+ *   len bytes string (not 0-terminated)
+ *
+ * @buf should be enough to store @max_len+1
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_read_size_string(NBDClient *client, char *buf,
+                                uint32_t max_len, Error **errp)

Would existing code benefit from using this helper? If so, splitting itinto a separate patch, plus converting initial clients to use it, wouldbe worthwhile.

+
+static int nbd_negotiate_send_meta_context(NBDClient *client,
+                                           const char *context,
+                                           uint32_t context_id,
+                                           Error **errp)


No comment documenting this function?

+{
+    NBDOptionReplyMetaContext opt;
+    struct iovec iov[] = {
+        {.iov_base = &opt, .iov_len = sizeof(opt)},
+        {.iov_base = (void *)context, .iov_len = strlen(context)}

Casting to void* looks suspicious, but I see that it is casting awayconst. Okay.

+    };
+
+    set_be_option_rep(&opt.h, client->opt, NBD_REP_META_CONTEXT,
+                      sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
+    stl_be_p(&opt.context_id, context_id);
+
+    return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
+}
+
+static void nbd_meta_base_query(NBDExportMetaContexts *meta, const char *query)
+{


Again, function comments are useful.

+    if (query[0] == '\0' || strcmp(query, "allocation") == 0) {
+        /* Note: empty query should select all contexts within base
+         * namespace. */
+        meta->base_allocation = true;

From the client perspective, this handling of the empty leaf-name workswell for NBD_OPT_LIST_META_CONTEXT (I want to see what leaf nodes theserver supports), but not so well for NBD_OPT_SET_META_CONTEXT (askingthe server to send ALL base allocations, even when I don't necessarilyknow how to interpret anything other than base:allocation, is a waste).So this function needs a bool parameter that says whether it is beinginvoked from _LIST (empty string okay, to advertise ALL base leaf namesback to client, which for now is just base:allocation) or from _SET(empty string is ignored as invalid; client has to specifically ask forbase:allocation by name).

+    }
+}
+
+/* nbd_negotiate_meta_query
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_query(NBDClient *client,
+                                    NBDExportMetaContexts *meta, Error **errp)
+{
+    int ret;
+    char *query, *colon, *namespace, *subquery;

Is it worth stack-allocating query here, so you don't have to g_free()it later? NBD limits the maximum string to 4k, which is a little bitbig for stack allocation (on an operating system with 4k pages,allocating more than 4k on the stack in one function risks missing theguard page on stack overflow), but we also have the benefit that we KNOWthat the set of meta-context namespaces that we support have a muchsmaller maximum limit of what we even care about.

+
+    ret = nbd_alloc_read_size_string(client, &query, errp);
+    if (ret <= 0) {
+        return ret;
+    }
+
+    colon = strchr(query, ':');
+    if (colon == NULL) {
+        ret = nbd_opt_invalid(client, errp, "no colon in query");
+        goto out;

Hmm, that puts a slight wrinkle into my proposal, or else maybe it issomething I should bring up on the NBD list. If we only read 5characters (because the max namespace WE support is "base:"), but aclient asks for namespace "X-longname:", we should gracefully ignore theclient's request; while we still want to reply with an error to a clientthat asks for "garbage" with no colon at all. The question for the NBDspec, then, is whether detecting bad client requests that didn't usecolon is mandatory for the server (meaning we MUST read the entirenamespace request, and search for the colon) or merely best effort (weonly have to read 5 characters, and if we silently ignore instead ofdiagnose a bad namespace request that was longer than that, oh well).Worded from the client, it switches to a question of whether the clientshould expect the server to diagnose all requests, or must be preparedfor the server to ignore requests even where those requests are bogus.Or, the NBD spec may change slightly to pass namespace and leafname asseparate fields, both with lengths, rather than a colon, to make iteasier for the server to skip over an unknown namespace/leaf pairwithout having to parse whether a colon was present. I'll send that ina separate email (the upstream NBD list doesn't need to see all myreview comments on this thread).

+    }
+    *colon = '\0';
+    namespace = query;
+    subquery = colon + 1;
+
+    if (strcmp(namespace, "base") == 0) {
+        nbd_meta_base_query(meta, subquery);
+    }
+
+out:
+    g_free(query);
+    return ret;
+}
+
+/* nbd_negotiate_meta_queries
+ * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_queries(NBDClient *client,
+                                      NBDExportMetaContexts *meta, Error 
**errp)
+{

The two options can mostly share code, but see my earlier comments abouthow I think you need to distinguish between list ('base:' is a validquery) and set ('base:' is an invalid request).

+    int ret;
+    NBDExport *exp;
+    NBDExportMetaContexts local_meta;
+    uint32_t nb_queries;
+    int i;
+
+    assert(client->structured_reply);

Hmm. Asserting here is bad if the client can get here withoutnegotiating structured reply (I'll have to see if you checked that inthe caller... [1])

+
+    if (meta == NULL) {


Bikeshedding - I like writing 'if (!meta) {' as it is fewer characters.

The function documentation should mention that 'meta' may be NULL fromthe caller. But (without seeing the callers yet), why? Are you arguingthat _LIST uses local_meta (because it only needs something for theduration of the reply) while _SET supplies a pre-existing meta (that isassociated with the server state, and used when the client finally callsNBD_OPT_GO)?

+        meta = &local_meta;
+    }
+
+    memset(meta, 0, sizeof(*meta));
+
+    ret = nbd_read_size_string(client, meta->export_name,
+                               NBD_MAX_NAME_SIZE, errp);

Revisiting a question I raised in my first half review - you saved thename as part of the struct because we have to later compare that thefinal OPT_SET export name matches the request during OPT_GO (if theydon't match, then we have no contexts to serve after all). So a 'constchar *' won't work, but maybe the struct could use a 'char *' pointingto malloc'd storage rather than char[MAX_NAME] that reserves array spacethat is mostly unused for the typical name that is much shorter than themaximum name length.

+    if (ret <= 0) {
+        return ret;
+    }
+
+    exp = nbd_export_find(meta->export_name);
+    if (exp == NULL) {
+        return nbd_opt_invalid(client, errp,
+                               "export '%s' not present", meta->export_name);


Wrong error; I think NBD_REP_ERR_UNKNOWN is better here.

+    }
+
+    ret = nbd_opt_read(client, &nb_queries, sizeof(nb_queries), errp);
+    if (ret <= 0) {
+        return ret;
+    }
+    cpu_to_be32s(&nb_queries);
+
+    for (i = 0; i < nb_queries; ++i) {
+        ret = nbd_negotiate_meta_query(client, meta, errp);
+        if (ret <= 0) {
+            return ret;
+        }
+    }

Interesting choice - you parse all queries prior to preparing anyresponse. I guess you could also reply as you read, but it doesn'tchange the fact that you have to remember what the final _SET selectedfor use later on. So this works.

+
+    if (meta->base_allocation) {
+        ret = nbd_negotiate_send_meta_context(client, "base:allocation",
+                                              NBD_META_ID_BASE_ALLOCATION,
+                                              errp);

Since earlier you have #define NBD_META_ID_BASE_ALLOCATION 0, that meansyou use the context ID of 0 in response to both _LIST (where the specsays the server SHOULD send 0, but the client SHOULD ignore the idfield) and for _SET (where the client needs a unique ID for everyseparate context that actually got selected - but since we only select asingle context, we get away with it). I suspect later patches will addadditional contexts where you'll have to actually worry about contextids (and always sending 0 in response to _LIST), but for now, this works :)

+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+    if (ret == 0) {
+        meta->valid = true;
+    }

Interesting - meta->valid can be true even if 0 contexts were selected;I presume we use this later on to decide whether to reply with EINVALvs. an empty list if a client uses NBD_CMD_BLOCK_STATUS without a valid_SET call. Then again, the NBD spec currently states thatNBD_CMD_BLOCK_STATUS should not be used even if _SET succeeded, if the_SET didn't return at least one context.

+
+    return ret;
+}
+
  /* nbd_negotiate_options
   * Process all NBD_OPT_* client option commands, during fixed newstyle
   * negotiation.
@@ -826,6 +1031,22 @@ static int nbd_negotiate_options(NBDClient *client, 
uint16_t myflags,
                  }
                  break;

+ case NBD_OPT_LIST_META_CONTEXT:

+            case NBD_OPT_SET_META_CONTEXT:
+                if (!client->structured_reply) {
+                    ret = nbd_opt_invalid(
+                            client, errp,
+                            "request option '%s' when structured reply "
+                            "is not negotiated", nbd_opt_lookup(option));


[1] okay, you did filter out clients that request out of order.

+                } else if (option == NBD_OPT_LIST_META_CONTEXT) {
+                    ret = nbd_negotiate_meta_queries(client, NULL, errp);
+                } else {
+                    ret = nbd_negotiate_meta_queries(client,
+                                                     &client->export_meta,
+                                                     errp);

Okay, so you DO have a difference on whether you pass in 'meta' based onwhether it is a local response vs. setting things for later.

+                }
+                break;
+
              default:
                  ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
                                     "Unsupported option 0x%" PRIx32 " (%s)",
@@ -1446,6 +1667,78 @@ static int coroutine_fn 
nbd_co_send_structured_error(NBDClient *client,
      return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
  }

+static int blockstatus_to_extent_be(BlockDriverState *bs, uint64_t offset,

+                                    uint64_t bytes, NBDExtent *extent)
+{
+    uint64_t tail_bytes = bytes;
+
+    while (tail_bytes) {


I might have named this 'remaining'

+        uint32_t flags;
+        int64_t num;
+        int ret = bdrv_block_status_above(bs, NULL, offset, tail_bytes, &num,
+                                          NULL, NULL);
+        if (ret < 0) {
+            return ret;
+        }
+
+        flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+                (ret & BDRV_BLOCK_ZERO      ? NBD_STATE_ZERO : 0);


Looks like the correct bit mapping to me.

+
+        if (tail_bytes == bytes) {
+            extent->flags = flags;
+        }
+
+        if (flags != extent->flags) {
+            break;
+        }
+
+        offset += num;
+        tail_bytes -= num;
+    }
+
+    cpu_to_be32s(&extent->flags);
+    extent->length = cpu_to_be32(bytes - tail_bytes);

This only computes one extent, but tries to find the longest extentpossible (if two consecutive bdrv_block_status calls return the samestatus, perhaps because of fragmentation). Presumably this is called ina loop to find a full list of extents covering the client's entirerequest? [2]

+
+    return 0;
+}
+
+/* nbd_co_send_extents
+ * @extents should be in big-endian */
+static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
+                               NBDExtent *extents, unsigned nb_extents,
+                               uint32_t context_id, Error **errp)
+{
+    NBDStructuredMeta chunk;
+
+    struct iovec iov[] = {
+        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = extents, .iov_len = nb_extents * sizeof(extents[0])}
+    };
+
+    set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_BLOCK_STATUS,
+                 handle, sizeof(chunk) - sizeof(chunk.h) + iov[1].iov_len);

This says you can only send a single chunk, which matches the fact thatright now you support exactly one context type. Presumably laterpatches tweak this.

+    stl_be_p(&chunk.context_id, context_id);
+
+    return nbd_co_send_iov(client, iov, 2, errp);
+}
+
+static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
+                                    BlockDriverState *bs, uint64_t offset,
+                                    uint64_t length, uint32_t context_id,
+                                    Error **errp)


No function comment?

+{
+    int ret;
+    NBDExtent extent;
+
+    ret = blockstatus_to_extent_be(bs, offset, length, &extent);
+    if (ret < 0) {
+        return nbd_co_send_structured_error(
+                client, handle, -ret, "can't get block status", errp);
+    }
+
+    return nbd_co_send_extents(client, handle, &extent, 1, context_id, errp);

[2] Okay, no looping to answer the client's entire request, so that maybe something you add in a later patch, to keep this initial patchminimal. But at least it matches your commit message.

+}
+
  /* nbd_co_receive_request
   * Collect a client request. Return 0 if request looks valid, -EIO to drop
   * connection right away, and any other negative value to report an error to
@@ -1523,6 +1816,8 @@ static int nbd_co_receive_request(NBDRequestData *req, 
NBDRequest *request,
          valid_flags |= NBD_CMD_FLAG_DF;
      } else if (request->type == NBD_CMD_WRITE_ZEROES) {
          valid_flags |= NBD_CMD_FLAG_NO_HOLE;
+    } else if (request->type == NBD_CMD_BLOCK_STATUS) {
+        valid_flags |= NBD_CMD_FLAG_REQ_ONE;
      }
      if (request->flags & ~valid_flags) {
          error_setg(errp, "unsupported flags for command %s (got 0x%x)",
@@ -1650,6 +1945,19 @@ static coroutine_fn void nbd_trip(void *opaque)
          }

break;

+    case NBD_CMD_BLOCK_STATUS:
+        if (client->export_meta.base_allocation) {
+            ret = nbd_co_send_block_status(req->client, request.handle,
+                                           blk_bs(exp->blk), request.from,
+                                           request.len,
+                                           NBD_META_ID_BASE_ALLOCATION,
+                                           &local_err);


No check of client->export_meta.valid?

+        } else {
+            ret = -EINVAL;
+            error_setg(&local_err, "CMD_BLOCK_STATUS not negotiated");
+        }

At this point, we've either sent the final structured chunk, or we'vesent nothing and have ret < 0;...

+
+        break;
      default:
          error_setg(&local_err, "invalid request type (%" PRIu32 ") received",
                     request.type);
@@ -1680,7 +1988,7 @@ reply:


    if (client->structured_reply &&
        (ret < 0 || request.type == NBD_CMD_READ)) {

              ret = nbd_co_send_structured_done(req->client, request.handle,
                                                &local_err);
          }

...iIf the client negotiated structured reply, then we've sent the errormessage. But if the client did NOT negotiate structured reply, then...

-    } else {
+    } else if (request.type != NBD_CMD_BLOCK_STATUS) {

...we fail to reply at all. Oops. That's not good forinteroperability. You'll need to make sure that we reply with EINVALeven if the client (mistakenly) calls NBD_CMD_BLOCK_STATUS withoutnegotiating structured replies.

          ret = nbd_co_send_simple_reply(req->client, request.handle,
                                         ret < 0 ? -ret : 0,
                                         req->data, reply_data_len, &local_err);

Also missing from the patch - where do you validate that the export namein client->export_meta matches the export name in NBD_OPT_GO? If theydon't match, then export_meta should be discarded.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

Reply via email to