16.02.2018 16:21, Eric Blake wrote:
On 02/15/2018 07:51 AM, Vladimir Sementsov-Ogievskiy wrote:
Minimal realization: only one extent in server answer is supported.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
continuing where I left off,
+++ b/nbd/common.c
@@ -75,6 +75,10 @@ const char *nbd_opt_lookup(uint32_t opt)
return "go";
case NBD_OPT_STRUCTURED_REPLY:
return "structured reply";
+ case NBD_OPT_LIST_META_CONTEXT:
+ return "list meta context";
+ case NBD_OPT_SET_META_CONTEXT:
+ return "set meta context";
default:
Should the changes to the header for new macros and to common.c for
mapping bits to names be split into a separate patch, so that someone
could backport just the new constants and then the client-side
implementation, rather than being forced to backport the rest of the
server implementation at the same time?
Not a problem, I can split. I've thought about it, but decided for first
roll send it as one patch.
+
+/* nbd_read_size_string
+ *
+ * Read string in format
+ * uint32_t len
+ * len bytes string (not 0-terminated)
+ *
+ * @buf should be enough to store @max_len+1
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_read_size_string(NBDClient *client, char *buf,
+ uint32_t max_len, Error **errp)
Would existing code benefit from using this helper? If so, splitting
it into a separate patch, plus converting initial clients to use it,
would be worthwhile.
+
+static int nbd_negotiate_send_meta_context(NBDClient *client,
+ const char *context,
+ uint32_t context_id,
+ Error **errp)
No comment documenting this function?
+{
+ NBDOptionReplyMetaContext opt;
+ struct iovec iov[] = {
+ {.iov_base = &opt, .iov_len = sizeof(opt)},
+ {.iov_base = (void *)context, .iov_len = strlen(context)}
Casting to void* looks suspicious, but I see that it is casting away
const. Okay.
+ };
+
+ set_be_option_rep(&opt.h, client->opt, NBD_REP_META_CONTEXT,
+ sizeof(opt) - sizeof(opt.h) + iov[1].iov_len);
+ stl_be_p(&opt.context_id, context_id);
+
+ return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ?
-EIO : 0;
+}
+
+static void nbd_meta_base_query(NBDExportMetaContexts *meta, const
char *query)
+{
Again, function comments are useful.
+ if (query[0] == '\0' || strcmp(query, "allocation") == 0) {
+ /* Note: empty query should select all contexts within base
+ * namespace. */
+ meta->base_allocation = true;
From the client perspective, this handling of the empty leaf-name
works well for NBD_OPT_LIST_META_CONTEXT (I want to see what leaf
nodes the server supports), but not so well for
NBD_OPT_SET_META_CONTEXT (asking the server to send ALL base
allocations, even when I don't necessarily know how to interpret
anything other than base:allocation, is a waste). So this function
needs a bool parameter that says whether it is being invoked from
_LIST (empty string okay, to advertise ALL base leaf names back to
client, which for now is just base:allocation) or from _SET (empty
string is ignored as invalid; client has to specifically ask for
base:allocation by name).
"empty string is ignored as invalid", hm, do we have this in spec? I
think SET and LIST must select exactly same sets of contexts. It is
strange behavior of client to set "base:", but it is its decision. And I
don't thing that it is invalid.
Formally we may answer with NBD_REP_ERR_TOO_BIG, but it will look weird,
as client see that both base: and base:allocation returns _one_
context, but in one case it is too big. But if we will have several
base: contextes, server may fairly answer with NBD_REP_ERR_TOO_BIG.
So, I think for now the code is ok.
Also, I don't see NBD_REP_ERR_TOO_BIG possible reply in
NBD_OPT_LIST_META_CONTEXT description. Should it be here?
+ }
+}
+
+/* nbd_negotiate_meta_query
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_query(NBDClient *client,
+ NBDExportMetaContexts *meta,
Error **errp)
+{
+ int ret;
+ char *query, *colon, *namespace, *subquery;
Is it worth stack-allocating query here, so you don't have to g_free()
it later? NBD limits the maximum string to 4k, which is a little bit
big for stack allocation (on an operating system with 4k pages,
allocating more than 4k on the stack in one function risks missing the
guard page on stack overflow), but we also have the benefit that we
KNOW that the set of meta-context namespaces that we support have a
much smaller maximum limit of what we even care about.
it is not stack allocated, nbd_alloc_read_size_string calls g_malloc.
+
+ ret = nbd_alloc_read_size_string(client, &query, errp);
+ if (ret <= 0) {
+ return ret;
+ }
+
+ colon = strchr(query, ':');
+ if (colon == NULL) {
+ ret = nbd_opt_invalid(client, errp, "no colon in query");
+ goto out;
Hmm, that puts a slight wrinkle into my proposal, or else maybe it is
something I should bring up on the NBD list. If we only read 5
characters (because the max namespace WE support is "base:"), but a
client asks for namespace "X-longname:", we should gracefully ignore
the client's request; while we still want to reply with an error to a
client that asks for "garbage" with no colon at all. The question for
the NBD spec, then, is whether detecting bad client requests that
didn't use colon is mandatory for the server (meaning we MUST read the
entire namespace request, and search for the colon) or merely best
effort (we only have to read 5 characters, and if we silently ignore
instead of diagnose a bad namespace request that was longer than that,
oh well). Worded from the client, it switches to a question of whether
the client should expect the server to diagnose all requests, or must
be prepared for the server to ignore requests even where those
requests are bogus. Or, the NBD spec may change slightly to pass
namespace and leafname as separate fields, both with lengths, rather
than a colon, to make it easier for the server to skip over an unknown
namespace/leaf pair without having to parse whether a colon was
present. I'll send that in a separate email (the upstream NBD list
doesn't need to see all my review comments on this thread).
+ }
+ *colon = '\0';
+ namespace = query;
+ subquery = colon + 1;
+
+ if (strcmp(namespace, "base") == 0) {
+ nbd_meta_base_query(meta, subquery);
+ }
+
+out:
+ g_free(query);
+ return ret;
+}
+
+/* nbd_negotiate_meta_queries
+ * Handle NBD_OPT_LIST_META_CONTEXT and NBD_OPT_SET_META_CONTEXT
+ *
+ * Return -errno on I/O error, 0 if option was completely handled by
+ * sending a reply about inconsistent lengths, or 1 on success. */
+static int nbd_negotiate_meta_queries(NBDClient *client,
+ NBDExportMetaContexts *meta,
Error **errp)
+{
The two options can mostly share code, but see my earlier comments
about how I think you need to distinguish between list ('base:' is a
valid query) and set ('base:' is an invalid request).
+ int ret;
+ NBDExport *exp;
+ NBDExportMetaContexts local_meta;
+ uint32_t nb_queries;
+ int i;
+
+ assert(client->structured_reply);
Hmm. Asserting here is bad if the client can get here without
negotiating structured reply (I'll have to see if you checked that in
the caller... [1])
+
+ if (meta == NULL) {
Bikeshedding - I like writing 'if (!meta) {' as it is fewer characters.
The function documentation should mention that 'meta' may be NULL from
the caller. But (without seeing the callers yet), why? Are you
arguing that _LIST uses local_meta (because it only needs something
for the duration of the reply) while _SET supplies a pre-existing meta
(that is associated with the server state, and used when the client
finally calls NBD_OPT_GO)?
yes.
+ meta = &local_meta;
+ }
+
+ memset(meta, 0, sizeof(*meta));
+
+ ret = nbd_read_size_string(client, meta->export_name,
+ NBD_MAX_NAME_SIZE, errp);
Revisiting a question I raised in my first half review - you saved the
name as part of the struct because we have to later compare that the
final OPT_SET export name matches the request during OPT_GO (if they
don't match, then we have no contexts to serve after all). So a
'const char *' won't work, but maybe the struct could use a 'char *'
pointing to malloc'd storage rather than char[MAX_NAME] that reserves
array space that is mostly unused for the typical name that is much
shorter than the maximum name length.
+ if (ret <= 0) {
+ return ret;
+ }
+
+ exp = nbd_export_find(meta->export_name);
+ if (exp == NULL) {
+ return nbd_opt_invalid(client, errp,
+ "export '%s' not present",
meta->export_name);
Wrong error; I think NBD_REP_ERR_UNKNOWN is better here.
+ }
+
+ ret = nbd_opt_read(client, &nb_queries, sizeof(nb_queries), errp);
+ if (ret <= 0) {
+ return ret;
+ }
+ cpu_to_be32s(&nb_queries);
+
+ for (i = 0; i < nb_queries; ++i) {
+ ret = nbd_negotiate_meta_query(client, meta, errp);
+ if (ret <= 0) {
+ return ret;
+ }
+ }
Interesting choice - you parse all queries prior to preparing any
response. I guess you could also reply as you read, but it doesn't
change the fact that you have to remember what the final _SET selected
for use later on. So this works.
+
+ if (meta->base_allocation) {
+ ret = nbd_negotiate_send_meta_context(client,
"base:allocation",
+ NBD_META_ID_BASE_ALLOCATION,
+ errp);
Since earlier you have #define NBD_META_ID_BASE_ALLOCATION 0, that
means you use the context ID of 0 in response to both _LIST (where the
spec says the server SHOULD send 0, but the client SHOULD ignore the
id field) and for _SET (where the client needs a unique ID for every
separate context that actually got selected - but since we only select
a single context, we get away with it). I suspect later patches will
add additional contexts where you'll have to actually worry about
context ids (and always sending 0 in response to _LIST), but for now,
this works :)
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+ if (ret == 0) {
+ meta->valid = true;
+ }
Interesting - meta->valid can be true even if 0 contexts were
selected; I presume we use this later on to decide whether to reply
with EINVAL vs. an empty list if a client uses NBD_CMD_BLOCK_STATUS
without a valid _SET call. Then again, the NBD spec currently states
that NBD_CMD_BLOCK_STATUS should not be used even if _SET succeeded,
if the _SET didn't return at least one context.
+
+ return ret;
+}
+
/* nbd_negotiate_options
* Process all NBD_OPT_* client option commands, during fixed newstyle
* negotiation.
@@ -826,6 +1031,22 @@ static int nbd_negotiate_options(NBDClient
*client, uint16_t myflags,
}
break;
+ case NBD_OPT_LIST_META_CONTEXT:
+ case NBD_OPT_SET_META_CONTEXT:
+ if (!client->structured_reply) {
+ ret = nbd_opt_invalid(
+ client, errp,
+ "request option '%s' when structured
reply "
+ "is not negotiated",
nbd_opt_lookup(option));
[1] okay, you did filter out clients that request out of order.
+ } else if (option == NBD_OPT_LIST_META_CONTEXT) {
+ ret = nbd_negotiate_meta_queries(client, NULL,
errp);
+ } else {
+ ret = nbd_negotiate_meta_queries(client,
+ &client->export_meta,
+ errp);
Okay, so you DO have a difference on whether you pass in 'meta' based
on whether it is a local response vs. setting things for later.
+ }
+ break;
+
default:
ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
"Unsupported option 0x%" PRIx32
" (%s)",
@@ -1446,6 +1667,78 @@ static int coroutine_fn
nbd_co_send_structured_error(NBDClient *client,
return nbd_co_send_iov(client, iov, 1 + !!iov[1].iov_len, errp);
}
+static int blockstatus_to_extent_be(BlockDriverState *bs, uint64_t
offset,
+ uint64_t bytes, NBDExtent *extent)
+{
+ uint64_t tail_bytes = bytes;
+
+ while (tail_bytes) {
I might have named this 'remaining'
+ uint32_t flags;
+ int64_t num;
+ int ret = bdrv_block_status_above(bs, NULL, offset,
tail_bytes, &num,
+ NULL, NULL);
+ if (ret < 0) {
+ return ret;
+ }
+
+ flags = (ret & BDRV_BLOCK_ALLOCATED ? 0 : NBD_STATE_HOLE) |
+ (ret & BDRV_BLOCK_ZERO ? NBD_STATE_ZERO : 0);
Looks like the correct bit mapping to me.
+
+ if (tail_bytes == bytes) {
+ extent->flags = flags;
+ }
+
+ if (flags != extent->flags) {
+ break;
+ }
+
+ offset += num;
+ tail_bytes -= num;
+ }
+
+ cpu_to_be32s(&extent->flags);
+ extent->length = cpu_to_be32(bytes - tail_bytes);
This only computes one extent, but tries to find the longest extent
possible (if two consecutive bdrv_block_status calls return the same
status, perhaps because of fragmentation). Presumably this is called
in a loop to find a full list of extents covering the client's entire
request? [2]
yes. And tiny overhead is one extra found extent of other type, so this
function can't be efficiently reused to implement several extents.
However, I don't see real usage for this now, as we don't have multi
extent blockstatus support in block layer.. And code for dirty bitmap
would be separate anyway.
+
+ return 0;
+}
+
+/* nbd_co_send_extents
+ * @extents should be in big-endian */
+static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
+ NBDExtent *extents, unsigned nb_extents,
+ uint32_t context_id, Error **errp)
+{
+ NBDStructuredMeta chunk;
+
+ struct iovec iov[] = {
+ {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+ {.iov_base = extents, .iov_len = nb_extents *
sizeof(extents[0])}
+ };
+
+ set_be_chunk(&chunk.h, NBD_REPLY_FLAG_DONE,
NBD_REPLY_TYPE_BLOCK_STATUS,
+ handle, sizeof(chunk) - sizeof(chunk.h) +
iov[1].iov_len);
This says you can only send a single chunk, which matches the fact
that right now you support exactly one context type. Presumably later
patches tweak this.
+ stl_be_p(&chunk.context_id, context_id);
+
+ return nbd_co_send_iov(client, iov, 2, errp);
+}
+
+static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
+ BlockDriverState *bs, uint64_t
offset,
+ uint64_t length, uint32_t
context_id,
+ Error **errp)
No function comment?
+{
+ int ret;
+ NBDExtent extent;
+
+ ret = blockstatus_to_extent_be(bs, offset, length, &extent);
+ if (ret < 0) {
+ return nbd_co_send_structured_error(
+ client, handle, -ret, "can't get block status", errp);
+ }
+
+ return nbd_co_send_extents(client, handle, &extent, 1,
context_id, errp);
[2] Okay, no looping to answer the client's entire request, so that
may be something you add in a later patch, to keep this initial patch
minimal. But at least it matches your commit message.
+}
+
/* nbd_co_receive_request
* Collect a client request. Return 0 if request looks valid, -EIO
to drop
* connection right away, and any other negative value to report an
error to
@@ -1523,6 +1816,8 @@ static int
nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
valid_flags |= NBD_CMD_FLAG_DF;
} else if (request->type == NBD_CMD_WRITE_ZEROES) {
valid_flags |= NBD_CMD_FLAG_NO_HOLE;
+ } else if (request->type == NBD_CMD_BLOCK_STATUS) {
+ valid_flags |= NBD_CMD_FLAG_REQ_ONE;
}
if (request->flags & ~valid_flags) {
error_setg(errp, "unsupported flags for command %s (got
0x%x)",
@@ -1650,6 +1945,19 @@ static coroutine_fn void nbd_trip(void *opaque)
}
break;
+ case NBD_CMD_BLOCK_STATUS:
+ if (client->export_meta.base_allocation) {
+ ret = nbd_co_send_block_status(req->client, request.handle,
+ blk_bs(exp->blk),
request.from,
+ request.len,
+ NBD_META_ID_BASE_ALLOCATION,
+ &local_err);
No check of client->export_meta.valid?
hmm, strange, looks like I don't use .valid at all. worth add.
+ } else {
+ ret = -EINVAL;
+ error_setg(&local_err, "CMD_BLOCK_STATUS not negotiated");
+ }
At this point, we've either sent the final structured chunk, or we've
sent nothing and have ret < 0;...
+
+ break;
default:
error_setg(&local_err, "invalid request type (%" PRIu32 ")
received",
request.type);
@@ -1680,7 +1988,7 @@ reply:
if (client->structured_reply &&
(ret < 0 || request.type == NBD_CMD_READ)) {
ret = nbd_co_send_structured_done(req->client,
request.handle,
&local_err);
}
...iIf the client negotiated structured reply, then we've sent the
error message. But if the client did NOT negotiate structured reply,
then...
- } else {
+ } else if (request.type != NBD_CMD_BLOCK_STATUS) {
...we fail to reply at all. Oops. That's not good for
interoperability. You'll need to make sure that we reply with EINVAL
even if the client (mistakenly) calls NBD_CMD_BLOCK_STATUS without
negotiating structured replies.
good caught! I felt that I'm doing something ugly, when I was writing
this if. I'll think about how to refactor this a bit.
ret = nbd_co_send_simple_reply(req->client, request.handle,
ret < 0 ? -ret : 0,
req->data, reply_data_len,
&local_err);
Also missing from the patch - where do you validate that the export
name in client->export_meta matches the export name in NBD_OPT_GO? If
they don't match, then export_meta should be discarded.
hm, me too.
Thank you for careful review!
--
Best regards,
Vladimir