Re: [Qemu-devel] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

Vladimir Sementsov-Ogievskiy Fri, 16 Feb 2018 06:44:23 -0800

16.02.2018 16:21, Eric Blake wrote:

On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:
Minimal realization: only one extent in server answer is supported.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
continuing where I left off,
+++ b/nbd/common.c
@@ -75,6 +75,10 @@ const char *nbd_opt_lookup(uint32_t opt)
          return "go";
      case NBD_OPT_STRUCTURED_REPLY:
          return "structured reply";
+    case NBD_OPT_LIST_META_CONTEXT:
+        return "list meta context";
+    case NBD_OPT_SET_META_CONTEXT:
+        return "set meta context";
      default:
Should the changes to the header for new macros and to common.c formapping bits to names be split into a separate patch, so that someonecould backport just the new constants and then the client-sideimplementation, rather than being forced to backport the rest of theserver implementation at the same time?

Not a problem, I can split. I've thought about it, but decided for firstroll send it as one patch.

+
+/* nbd_read_size_string
+ *
+ * Read string in format
+ *   uint32_t len
+ *   len bytes string (not 0-terminated)
+ *
+ * @buf should be enough to store @max_len+1
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_read_size_string(NBDClient *client, char *buf,
+                                uint32_t max_len, Error **errp)
Would existing code benefit from using this helper? If so, splittingit into a separate patch, plus converting initial clients to use it,would be worthwhile.
+
+static int nbd_negotiate_send_meta_context(NBDClient *client,
+                                           const char *context,
+                                           uint32_t context_id,
+                                           Error **errp)
No comment documenting this function?
+{
+    NBDOptionReplyMetaContext opt;
+    struct iovec iov[] = {
+        {.iov_base = &opt, .iov_len = sizeof(opt)},
+        {.iov_base = (void *)context, .iov_len = strlen(context)}
Casting to void* looks suspicious, but I see that it is casting awayconst. Okay.
+    };
+
+    set_be_option_rep(&opt.h, client->opt, NBD_REP_META_CONTEXT,
+                      sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
+    stl_be_p(&opt.context_id, context_id);
+
+ return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ?-EIO : 0;
+}
+
+static void nbd_meta_base_query(NBDExportMetaContexts *meta, constchar *query)
+{
Again, function comments are useful.
+    if (query[0] == '\0' || strcmp(query, "allocation") == 0) {
+        /* Note: empty query should select all contexts within base
+         * namespace. */
+        meta->base_allocation = true;
From the client perspective, this handling of the empty leaf-nameworks well for NBD_OPT_LIST_META_CONTEXT (I want to see what leafnodes the server supports), but not so well forNBD_OPT_SET_META_CONTEXT (asking the server to send ALL baseallocations, even when I don't necessarily know how to interpretanything other than base:allocation, is a waste). So this functionneeds a bool parameter that says whether it is being invoked from_LIST (empty string okay, to advertise ALL base leaf names back toclient, which for now is just base:allocation) or from _SET (emptystring is ignored as invalid; client has to specifically ask forbase:allocation by name).

"empty string is ignored as invalid", hm, do we have this in spec? Ithink SET and LIST must select exactly same sets of contexts. It isstrange behavior of client to set "base:", but it is its decision. And Idon't thing that it is invalid.Formally we may answer with NBD_REP_ERR_TOO_BIG, but it will look weird,as client see that both base: and base:allocation returns _one_ context, but in one case it is too big. But if we will have severalbase: contextes, server may fairly answer with NBD_REP_ERR_TOO_BIG.


So, I think for now the code is ok.

Also, I don't see NBD_REP_ERR_TOO_BIG possible reply inNBD_OPT_LIST_META_CONTEXT description. Should it be here?

+    }
+}
+
+/* nbd_negotiate_meta_query
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_query(NBDClient *client,
+ NBDExportMetaContexts *meta,Error **errp)
+{
+    int ret;
+    char *query, *colon, *namespace, *subquery;
Is it worth stack-allocating query here, so you don't have to g_free()it later? NBD limits the maximum string to 4k, which is a little bitbig for stack allocation (on an operating system with 4k pages,allocating more than 4k on the stack in one function risks missing theguard page on stack overflow), but we also have the benefit that weKNOW that the set of meta-context namespaces that we support have amuch smaller maximum limit of what we even care about.


it is not stack allocated, nbd_alloc_read_size_string calls g_malloc.

+
+    ret = nbd_alloc_read_size_string(client, &query, errp);
+    if (ret <= 0) {
+        return ret;
+    }
+
+    colon = strchr(query, ':');
+    if (colon == NULL) {
+        ret = nbd_opt_invalid(client, errp, "no colon in query");
+        goto out;
Hmm, that puts a slight wrinkle into my proposal, or else maybe it issomething I should bring up on the NBD list. If we only read 5characters (because the max namespace WE support is "base:"), but aclient asks for namespace "X-longname:", we should gracefully ignorethe client's request; while we still want to reply with an error to aclient that asks for "garbage" with no colon at all. The question forthe NBD spec, then, is whether detecting bad client requests thatdidn't use colon is mandatory for the server (meaning we MUST read theentire namespace request, and search for the colon) or merely besteffort (we only have to read 5 characters, and if we silently ignoreinstead of diagnose a bad namespace request that was longer than that,oh well). Worded from the client, it switches to a question of whetherthe client should expect the server to diagnose all requests, or mustbe prepared for the server to ignore requests even where thoserequests are bogus. Or, the NBD spec may change slightly to passnamespace and leafname as separate fields, both with lengths, ratherthan a colon, to make it easier for the server to skip over an unknownnamespace/leaf pair without having to parse whether a colon waspresent. I'll send that in a separate email (the upstream NBD listdoesn't need to see all my review comments on this thread).
+    }
+    *colon = '\0';
+    namespace = query;
+    subquery = colon + 1;
+
+    if (strcmp(namespace, "base") == 0) {
+        nbd_meta_base_query(meta, subquery);
+    }
+
+out:
+    g_free(query);
+    return ret;
+}
+
+/* nbd_negotiate_meta_queries
+ * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_queries(NBDClient *client,
+ NBDExportMetaContexts *meta,Error **errp)
+{
The two options can mostly share code, but see my earlier commentsabout how I think you need to distinguish between list ('base:' is avalid query) and set ('base:' is an invalid request).
+    int ret;
+    NBDExport *exp;
+    NBDExportMetaContexts local_meta;
+    uint32_t nb_queries;
+    int i;
+
+    assert(client->structured_reply);
Hmm. Asserting here is bad if the client can get here withoutnegotiating structured reply (I'll have to see if you checked that inthe caller... [1])
+
+    if (meta == NULL) {
Bikeshedding - I like writing 'if (!meta) {' as it is fewer characters.
The function documentation should mention that 'meta' may be NULL fromthe caller. But (without seeing the callers yet), why? Are youarguing that _LIST uses local_meta (because it only needs somethingfor the duration of the reply) while _SET supplies a pre-existing meta(that is associated with the server state, and used when the clientfinally calls NBD_OPT_GO)?


yes.

+        meta = &local_meta;
+    }
+
+    memset(meta, 0, sizeof(*meta));
+
+    ret = nbd_read_size_string(client, meta->export_name,
+                               NBD_MAX_NAME_SIZE, errp);
Revisiting a question I raised in my first half review - you saved thename as part of the struct because we have to later compare that thefinal OPT_SET export name matches the request during OPT_GO (if theydon't match, then we have no contexts to serve after all). So a'const char *' won't work, but maybe the struct could use a 'char *'pointing to malloc'd storage rather than char[MAX_NAME] that reservesarray space that is mostly unused for the typical name that is muchshorter than the maximum name length.
+    if (ret <= 0) {
+        return ret;
+    }
+
+    exp = nbd_export_find(meta->export_name);
+    if (exp == NULL) {
+        return nbd_opt_invalid(client, errp,
+ "export '%s' not present",meta->export_name);
Wrong error; I think NBD_REP_ERR_UNKNOWN is better here.
+    }
+
+    ret = nbd_opt_read(client, &nb_queries, sizeof(nb_queries), errp);
+    if (ret <= 0) {
+        return ret;
+    }
+    cpu_to_be32s(&nb_queries);
+
+    for (i = 0; i < nb_queries; ++i) {
+        ret = nbd_negotiate_meta_query(client, meta, errp);
+        if (ret <= 0) {
+            return ret;
+        }
+    }
Interesting choice - you parse all queries prior to preparing anyresponse. I guess you could also reply as you read, but it doesn'tchange the fact that you have to remember what the final _SET selectedfor use later on. So this works.
+
+    if (meta->base_allocation) {
+ ret = nbd_negotiate_send_meta_context(client,"base:allocation",
+ NBD_META_ID_BASE_ALLOCATION,
+                                              errp);
Since earlier you have #define NBD_META_ID_BASE_ALLOCATION 0, thatmeans you use the context ID of 0 in response to both _LIST (where thespec says the server SHOULD send 0, but the client SHOULD ignore theid field) and for _SET (where the client needs a unique ID for everyseparate context that actually got selected - but since we only selecta single context, we get away with it). I suspect later patches willadd additional contexts where you'll have to actually worry aboutcontext ids (and always sending 0 in response to _LIST), but for now,this works :)
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+    if (ret == 0) {
+        meta->valid = true;
+    }
Interesting - meta->valid can be true even if 0 contexts wereselected; I presume we use this later on to decide whether to replywith EINVAL vs. an empty list if a client uses NBD_CMD_BLOCK_STATUSwithout a valid _SET call. Then again, the NBD spec currently statesthat NBD_CMD_BLOCK_STATUS should not be used even if _SET succeeded,if the _SET didn't return at least one context.
+
+    return ret;
+}
+
  /* nbd_negotiate_options
   * Process all NBD_OPT_* client option commands, during fixed newstyle
   * negotiation.
@@ -826,6 +1031,22 @@ static int nbd_negotiate_options(NBDClient*client, uint16_t myflags,
                  }
                  break;
  +            case NBD_OPT_LIST_META_CONTEXT:
+            case NBD_OPT_SET_META_CONTEXT:
+                if (!client->structured_reply) {
+                    ret = nbd_opt_invalid(
+                            client, errp,
+ "request option '%s' when structuredreply "+ "is not negotiated",nbd_opt_lookup(option));
[1] okay, you did filter out clients that request out of order.
+                } else if (option == NBD_OPT_LIST_META_CONTEXT) {
+ ret = nbd_negotiate_meta_queries(client, NULL,errp);
+                } else {
+                    ret = nbd_negotiate_meta_queries(client,
+ &client->export_meta,
+                                                     errp);
Okay, so you DO have a difference on whether you pass in 'meta' basedon whether it is a local response vs. setting things for later.
+                }
+                break;
+
              default:
                  ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
"Unsupported option 0x%" PRIx32" (%s)",@@ -1446,6 +1667,78 @@ static int coroutine_fnnbd_co_send_structured_error(NBDClient *client,
      return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
  }
+static int blockstatus_to_extent_be(BlockDriverState *bs, uint64_toffset,
+                                    uint64_t bytes, NBDExtent *extent)
+{
+    uint64_t tail_bytes = bytes;
+
+    while (tail_bytes) {
I might have named this 'remaining'
+        uint32_t flags;
+        int64_t num;
+ int ret = bdrv_block_status_above(bs, NULL, offset,tail_bytes, &num,
+                                          NULL, NULL);
+        if (ret < 0) {
+            return ret;
+        }
+
+        flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+                (ret & BDRV_BLOCK_ZERO      ? NBD_STATE_ZERO : 0);
Looks like the correct bit mapping to me.
+
+        if (tail_bytes == bytes) {
+            extent->flags = flags;
+        }
+
+        if (flags != extent->flags) {
+            break;
+        }
+
+        offset += num;
+        tail_bytes -= num;
+    }
+
+    cpu_to_be32s(&extent->flags);
+    extent->length = cpu_to_be32(bytes - tail_bytes);
This only computes one extent, but tries to find the longest extentpossible (if two consecutive bdrv_block_status calls return the samestatus, perhaps because of fragmentation). Presumably this is calledin a loop to find a full list of extents covering the client's entirerequest? [2]

yes. And tiny overhead is one extra found extent of other type, so thisfunction can't be efficiently reused to implement several extents.However, I don't see real usage for this now, as we don't have multiextent blockstatus support in block layer.. And code for dirty bitmapwould be separate anyway.

+
+    return 0;
+}
+
+/* nbd_co_send_extents
+ * @extents should be in big-endian */
+static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
+                               NBDExtent *extents, unsigned nb_extents,
+                               uint32_t context_id, Error **errp)
+{
+    NBDStructuredMeta chunk;
+
+    struct iovec iov[] = {
+        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+ {.iov_base = extents, .iov_len = nb_extents *sizeof(extents[0])}
+    };
+
+ set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE,NBD_REPLY_TYPE_BLOCK_STATUS,+ handle, sizeof(chunk) - sizeof(chunk.h) +iov[1].iov_len);
This says you can only send a single chunk, which matches the factthat right now you support exactly one context type. Presumably laterpatches tweak this.
+    stl_be_p(&chunk.context_id, context_id);
+
+    return nbd_co_send_iov(client, iov, 2, errp);
+}
+
+static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
+ BlockDriverState *bs, uint64_toffset,+ uint64_t length, uint32_tcontext_id,
+                                    Error **errp)
No function comment?
+{
+    int ret;
+    NBDExtent extent;
+
+    ret = blockstatus_to_extent_be(bs, offset, length, &extent);
+    if (ret < 0) {
+        return nbd_co_send_structured_error(
+                client, handle, -ret, "can't get block status", errp);
+    }
+
+ return nbd_co_send_extents(client, handle, &extent, 1,context_id, errp);
[2] Okay, no looping to answer the client's entire request, so thatmay be something you add in a later patch, to keep this initial patchminimal. But at least it matches your commit message.
+}
+
  /* nbd_co_receive_request
* Collect a client request. Return 0 if request looks valid, -EIOto drop * connection right away, and any other negative value to report anerror to@@ -1523,6 +1816,8 @@ static intnbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
          valid_flags |= NBD_CMD_FLAG_DF;
      } else if (request->type == NBD_CMD_WRITE_ZEROES) {
          valid_flags |= NBD_CMD_FLAG_NO_HOLE;
+    } else if (request->type == NBD_CMD_BLOCK_STATUS) {
+        valid_flags |= NBD_CMD_FLAG_REQ_ONE;
      }
      if (request->flags & ~valid_flags) {
error_setg(errp, "unsupported flags for command %s (got0x%x)",
@@ -1650,6 +1945,19 @@ static coroutine_fn void nbd_trip(void *opaque)
          }
            break;
+    case NBD_CMD_BLOCK_STATUS:
+        if (client->export_meta.base_allocation) {
+            ret = nbd_co_send_block_status(req->client, request.handle,
+ blk_bs(exp->blk),request.from,
+                                           request.len,
+ NBD_META_ID_BASE_ALLOCATION,
+                                           &local_err);
No check of client->export_meta.valid?


hmm, strange, looks like I don't use .valid at all. worth add.

+        } else {
+            ret = -EINVAL;
+            error_setg(&local_err, "CMD_BLOCK_STATUS not negotiated");
+        }
At this point, we've either sent the final structured chunk, or we'vesent nothing and have ret < 0;...
+
+        break;
      default:
error_setg(&local_err, "invalid request type (%" PRIu32 ")received",
                     request.type);
@@ -1680,7 +1988,7 @@ reply:
    if (client->structured_reply &&
        (ret < 0 || request.type == NBD_CMD_READ)) {
ret = nbd_co_send_structured_done(req->client,request.handle,
                                                &local_err);
          }
...iIf the client negotiated structured reply, then we've sent theerror message. But if the client did NOT negotiate structured reply,then...
-    } else {
+    } else if (request.type != NBD_CMD_BLOCK_STATUS) {
...we fail to reply at all. Oops. That's not good forinteroperability. You'll need to make sure that we reply with EINVALeven if the client (mistakenly) calls NBD_CMD_BLOCK_STATUS withoutnegotiating structured replies.

good caught! I felt that I'm doing something ugly, when I was writingthis if. I'll think about how to refactor this a bit.

          ret = nbd_co_send_simple_reply(req->client, request.handle,
                                         ret < 0 ? -ret : 0,
req->data, reply_data_len,&local_err);
Also missing from the patch - where do you validate that the exportname in client->export_meta matches the export name in NBD_OPT_GO? Ifthey don't match, then export_meta should be discarded.

hm, me too.


Thank you for careful review!

--
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH 3/9] nbd: BLOCK_STATUS for standard get_block_status function: server part

Reply via email to