Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

Eric Blake Tue, 15 Feb 2022 15:24:46 -0800

On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote:
> On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <ebl...@redhat.com> wrote:
> 
> > According to the NBD spec, a server advertising
> > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> > not see any cache inconsistencies: when properly separated by a single
> > flush, actions performed by one client will be visible to another
> > client, regardless of which client did the flush.  We satisfy these
> > conditions in qemu when our block layer is backed by the local
> > filesystem (by virtue of the semantics of fdatasync(), and the fact
> > that qemu itself is not buffering writes beyond flushes).  It is
> > harder to state whether we satisfy these conditions for network-based
> > protocols, so the safest course of action is to allow users to opt-in
> > to advertising multi-conn.  We may later tweak defaults to advertise
> > by default when the block layer can confirm that the underlying
> > protocol driver is cache consistent between multiple writers, but for
> > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
> > advertisement in a known-safe setup where the client end can then
> > benefit from parallel clients.
> >
> 
> It makes sense, and will be used by oVirt. Actually we are already using
> multiple connections for writing about 2 years, based on your promise
> that if every client writes to district  areas this is safe.


I presume s/district/distinct/, but yes, I'm glad we're finally trying
to make the code match existing practice ;)

> > +++ b/docs/tools/qemu-nbd.rst
> > @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified.
> >  .. option:: -e, --shared=NUM
> >
> >    Allow up to *NUM* clients to share the device (default
> > -  ``1``), 0 for unlimited. Safe for readers, but for now,
> > -  consistency is not guaranteed between multiple writers.
> > +  ``1``), 0 for unlimited.
> >
> 
> Removing the note means that now consistency is guaranteed between
> multiple writers, no?
> 
> Or maybe we want to mention here that consistency depends on the protocol
> and users can opt in, or refer to the section where this is discussed?

Yeah, a link to the QAPI docs where multi-conn is documented might be
nice, except I'm not sure the best way to do that in our sphinx
documentation setup.

> > +##
> > +# @NbdExportMultiConn:
> > +#
> > +# Possible settings for advertising NBD multiple client support.
> > +#
> > +# @off: Do not advertise multiple clients.
> > +#
> > +# @on: Allow multiple clients (for writable clients, this is only safe
> > +#      if the underlying BDS is cache-consistent, such as when backed
> > +#      by the raw file driver); ignored if the NBD server was set up
> > +#      with max-connections of 1.
> > +#
> > +# @auto: Behaves like @off if the export is writable, and @on if the
> > +#        export is read-only.
> > +#
> > +# Since: 7.0
> > +##
> > +{ 'enum': 'NbdExportMultiConn',
> > +  'data': ['off', 'on', 'auto'] }
> >
> 
> Are we going to have --multi-con=(on|off|auto)?

Oh. The QMP command (which is immediately visible through
nbd-server-add/block-storage-add to qemu and qemu-storage-daemon)
gains "multi-conn":"on", but you may be right that qemu-nbd would want
a command line option (either that, or we accellerate our plans that
qsd should replace qemu-nbd).

> > +++ b/blockdev-nbd.c
> > @@ -44,6 +44,11 @@ bool nbd_server_is_running(void)
> >      return nbd_server || is_qemu_nbd;
> >  }
> >
> > +int nbd_server_max_connections(void)
> > +{
> > +    return nbd_server ? nbd_server->max_connections : -1;
> > +}
> >
> 
> -1 is a little bit strange for a limit, maybe 1 is a better default when
> we nbd_server == NULL? When can this happen?

In qemu, if you haven't used the QMP command 'nbd-server-start' yet.
In qemu-nbd, always (per the nbd_server_is_running function just
above).  My iotest only covered the qemu/qsd side, not the qemu-nbd
side, so it looks like I need a v3...

> > +++ b/nbd/server.c

> > +    /*
> > +     * Determine whether to advertise multi-conn.  Default is auto,
> > +     * which resolves to on for read-only and off for writable.  But
> > +     * if the server has max-connections 1, that forces the flag off.
> >
> 
> Looks good, this can be enabled automatically based on the driver
> if we want, so users using auto will can upgrade to multi-con automatically.

Yes, that's part of why I made it a tri-state with a default of 'auto'.

> 
> 
> > +     */
> > +    if (!arg->has_multi_conn) {
> > +        arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO;
> > +    }
> > +    if (nbd_server_max_connections() == 1) {
> 
> +        arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF;
> > +    }
> 
> +    if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) {
> > +        multi_conn = readonly;
> > +    } else {
> > +        multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON;
> > +    }
> >
> 
> This part is a little bit confusing - we do:
> - initialize args->multi_con if it has not value
> - set the temporary multi_con based now initialized args->multi_con
> 
> I think it will be nicer to separate arguments parsing, so there is no need
> to initialize it here or have arg->has_multi_conn, but I did not check how
> other arguments are handled.

arg->has_multi_conn is fallout from the fact that our QAPI must remain
back-compat. If it is false, the user did not pass "multi-conn":...,
so we want the default value of "auto".  Once we've established the
default, then we force multi-conn off if we know we are limited to one
client (although as you pointed out, nbd_server_max_connections()
needs to be tested with qemu-nbd).  Then, it's easier to resolve the
tri-state enum variable into the bool of what we actually advertise to
the NBD client.

> > +++ b/tests/qemu-iotests/tests/nbd-multiconn
> > @@ -0,0 +1,188 @@
> > +#!/usr/bin/env bash
> > +# group: rw auto quick
> > +#
> > +# Test that qemu-nbd MULTI_CONN works
> > +#
> > +echo
> > +echo "=== Initial image setup ==="
> > +echo
> > +
> > +_make_test_img 4M
> > +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io
> > +_launch_qemu 2> >(_filter_nbd)
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add",
> > +  "arguments":{"driver":"qcow2", "node-name":"n",
> > +    "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return"

I'm not the best at writing python iotests; I welcome a language
translation of this aspect.  But the shell code for
_launch_qemu/_send_qemu_cmd was already pretty nice for setting up a
long-running background qemu process where I can pass in QMP commands
at will, then interact from other processes.

> > +export nbd_unix_socket
> > +
> > +echo
> > +echo "=== Default nbd server settings ==="
> > +echo
> > +# Default allows for unlimited connections, readonly images advertise
> > +# multi-conn, and writable images do not
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
> > +  "arguments":{"addr":{"type":"unix",
> > +    "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add",
> > +  "arguments":{"type":"nbd", "id":"r", "node-name":"n",
> > +    "name":"r"}}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add",
> > +  "arguments":{"type":"nbd", "id":"w", "node-name":"n",
> > +    "name":"w", "writable":true}}' "return"
> > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c '
> > +assert h.can_multi_conn()
> > +h.shutdown()
> > +print("nbdsh passed")'
> > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c '
> > +assert not h.can_multi_conn()
> > +h.shutdown()
> > +print("nbdsh passed")'
> >
> 
> Mixing of shell and python is very confusing. Wouldn't it be much cleaner
> to write the test in python?

Here, nbdsh -c 'python snippet' is used as a shell command line
parameter.  Writing python code to call out to a system() command
where one of the arguments to that command is a python script snippet
is going to be just as annoying as writing it in bash.

> > +echo
> > +echo "=== Demonstrate parallel writers ==="
> > +echo
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
> > +  "arguments":{"addr":{"type":"unix",
> > +    "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add",
> > +  "arguments":{"type":"nbd", "id":"w", "node-name":"n",
> > +    "name":"", "multi-conn":"on", "writable":true}}' "return"
> > +
> > +nbdsh -c '
> > +import os
> > +sock = os.getenv("nbd_unix_socket")
> > +h = []
> > +
> > +for i in range(3):
> > +  h.append(nbd.NBD())
> > +  h[i].connect_unix(sock)
> > +  assert h[i].can_multi_conn()
> >
> 
> This is somewhat C in python, maybe:
> 
>     handles = [nbd.NBD() for _ in range(3)]
> 
>     for h in handles:
>         h.connect_unix(sock)
>         assert h.can_multi_con()
> 
> Since assert will abort the test, and we don't handle
> exceptions, failure will not shutdown the connections
> but it should not matter for the purpose of a test.

As I said, I'm open to python suggestions :)  I like your approach.

> 
> 
> > +
> > +buf1 = h[0].pread(1024 * 1024, 0)
> > +if buf1 != b"\x01" * 1024 * 1024:
> > +  print("Unexpected initial read")
> >
> 
> Not clear when we initialize the buffer to \x01 - is this the qemu-io above?

Yes, when the qcow2 file was initially created.

> 
> 
> > +buf2 = b"\x03" * 1024 * 1024
> > +h[1].pwrite(buf2, 0)
> > +h[2].flush()
> > +buf1 = h[0].pread(1024 * 1024, 0)
> > +if buf1 == buf2:
> > +  print("Flush appears to be consistent across connections")
> >
> 
> buf1 was the initial content, buf2 is the new content, but now we override
> but1 to check that the right was flushed. It will be be better to use
> different
> names like inittial_data, updated_data, current_data.

Can do.

> 
> This block is the most important part of the test, showing that we can write
> in one connection, flush in the second, and read the written block in the
> third.
> Maybe add a comment about this? I think it will help someone else trying
> to understand what this part is trying to test.

Can do.

> > +{"execute":"block-export-add",
> > +  "arguments":{"type":"nbd", "id":"w", "node-name":"n",
> > +    "name":"", "multi-conn":"on", "writable":true}}
> > +{"return": {}}
> > +Flush appears to be consistent across connections
> > +{"execute":"block-export-del",
> > +  "arguments":{"id":"w"}}
> > +{"return": {}}
> > +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP},
> > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}}
> > +{"execute":"nbd-server-stop"}
> > +{"return": {}}
> > +{"execute":"quit"}
> > +{"return": {}}
> > +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP},
> > "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
> >
> 
> Nothing about the contents here says anything about the actual test
> except the "Flush appears..." line.

Yeah, it's a lot of QMP debugging (to show what commands were run in
setting up the server), and less verbose in the nbdsh side.  Do I need
to add more output during the nbdsh that uses multiple connections?

> 
> 
> > +*** done
> > --
> > 2.35.1
> >
> 
> Looks good, feel free to ignore the style comments and suggestion to rewrite
> the test in python.

The style comments are nice, the rewriting is going to be harder for
me (but I'll accept help).  At any rate, getting qemu-nbd to be
feature-compatible may require a v3 anyway.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

Reply via email to