On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote: > On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <ebl...@redhat.com> wrote: > > > According to the NBD spec, a server advertising > > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > > not see any cache inconsistencies: when properly separated by a single > > flush, actions performed by one client will be visible to another > > client, regardless of which client did the flush. We satisfy these > > conditions in qemu when our block layer is backed by the local > > filesystem (by virtue of the semantics of fdatasync(), and the fact > > that qemu itself is not buffering writes beyond flushes). It is > > harder to state whether we satisfy these conditions for network-based > > protocols, so the safest course of action is to allow users to opt-in > > to advertising multi-conn. We may later tweak defaults to advertise > > by default when the block layer can confirm that the underlying > > protocol driver is cache consistent between multiple writers, but for > > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn > > advertisement in a known-safe setup where the client end can then > > benefit from parallel clients. > > > > It makes sense, and will be used by oVirt. Actually we are already using > multiple connections for writing about 2 years, based on your promise > that if every client writes to district areas this is safe.
I presume s/district/distinct/, but yes, I'm glad we're finally trying to make the code match existing practice ;) > > +++ b/docs/tools/qemu-nbd.rst > > @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. > > .. option:: -e, --shared=NUM > > > > Allow up to *NUM* clients to share the device (default > > - ``1``), 0 for unlimited. Safe for readers, but for now, > > - consistency is not guaranteed between multiple writers. > > + ``1``), 0 for unlimited. > > > > Removing the note means that now consistency is guaranteed between > multiple writers, no? > > Or maybe we want to mention here that consistency depends on the protocol > and users can opt in, or refer to the section where this is discussed? Yeah, a link to the QAPI docs where multi-conn is documented might be nice, except I'm not sure the best way to do that in our sphinx documentation setup. > > +## > > +# @NbdExportMultiConn: > > +# > > +# Possible settings for advertising NBD multiple client support. > > +# > > +# @off: Do not advertise multiple clients. > > +# > > +# @on: Allow multiple clients (for writable clients, this is only safe > > +# if the underlying BDS is cache-consistent, such as when backed > > +# by the raw file driver); ignored if the NBD server was set up > > +# with max-connections of 1. > > +# > > +# @auto: Behaves like @off if the export is writable, and @on if the > > +# export is read-only. > > +# > > +# Since: 7.0 > > +## > > +{ 'enum': 'NbdExportMultiConn', > > + 'data': ['off', 'on', 'auto'] } > > > > Are we going to have --multi-con=(on|off|auto)? Oh. The QMP command (which is immediately visible through nbd-server-add/block-storage-add to qemu and qemu-storage-daemon) gains "multi-conn":"on", but you may be right that qemu-nbd would want a command line option (either that, or we accellerate our plans that qsd should replace qemu-nbd). > > +++ b/blockdev-nbd.c > > @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) > > return nbd_server || is_qemu_nbd; > > } > > > > +int nbd_server_max_connections(void) > > +{ > > + return nbd_server ? nbd_server->max_connections : -1; > > +} > > > > -1 is a little bit strange for a limit, maybe 1 is a better default when > we nbd_server == NULL? When can this happen? In qemu, if you haven't used the QMP command 'nbd-server-start' yet. In qemu-nbd, always (per the nbd_server_is_running function just above). My iotest only covered the qemu/qsd side, not the qemu-nbd side, so it looks like I need a v3... > > +++ b/nbd/server.c > > + /* > > + * Determine whether to advertise multi-conn. Default is auto, > > + * which resolves to on for read-only and off for writable. But > > + * if the server has max-connections 1, that forces the flag off. > > > > Looks good, this can be enabled automatically based on the driver > if we want, so users using auto will can upgrade to multi-con automatically. Yes, that's part of why I made it a tri-state with a default of 'auto'. > > > > + */ > > + if (!arg->has_multi_conn) { > > + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO; > > + } > > + if (nbd_server_max_connections() == 1) { > > + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF; > > + } > > + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) { > > + multi_conn = readonly; > > + } else { > > + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON; > > + } > > > > This part is a little bit confusing - we do: > - initialize args->multi_con if it has not value > - set the temporary multi_con based now initialized args->multi_con > > I think it will be nicer to separate arguments parsing, so there is no need > to initialize it here or have arg->has_multi_conn, but I did not check how > other arguments are handled. arg->has_multi_conn is fallout from the fact that our QAPI must remain back-compat. If it is false, the user did not pass "multi-conn":..., so we want the default value of "auto". Once we've established the default, then we force multi-conn off if we know we are limited to one client (although as you pointed out, nbd_server_max_connections() needs to be tested with qemu-nbd). Then, it's easier to resolve the tri-state enum variable into the bool of what we actually advertise to the NBD client. > > +++ b/tests/qemu-iotests/tests/nbd-multiconn > > @@ -0,0 +1,188 @@ > > +#!/usr/bin/env bash > > +# group: rw auto quick > > +# > > +# Test that qemu-nbd MULTI_CONN works > > +# > > +echo > > +echo "=== Initial image setup ===" > > +echo > > + > > +_make_test_img 4M > > +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io > > +_launch_qemu 2> >(_filter_nbd) > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", > > + "arguments":{"driver":"qcow2", "node-name":"n", > > + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" I'm not the best at writing python iotests; I welcome a language translation of this aspect. But the shell code for _launch_qemu/_send_qemu_cmd was already pretty nice for setting up a long-running background qemu process where I can pass in QMP commands at will, then interact from other processes. > > +export nbd_unix_socket > > + > > +echo > > +echo "=== Default nbd server settings ===" > > +echo > > +# Default allows for unlimited connections, readonly images advertise > > +# multi-conn, and writable images do not > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > > + "arguments":{"addr":{"type":"unix", > > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > > + "name":"r"}}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > > + "name":"w", "writable":true}}' "return" > > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' > > +assert h.can_multi_conn() > > +h.shutdown() > > +print("nbdsh passed")' > > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' > > +assert not h.can_multi_conn() > > +h.shutdown() > > +print("nbdsh passed")' > > > > Mixing of shell and python is very confusing. Wouldn't it be much cleaner > to write the test in python? Here, nbdsh -c 'python snippet' is used as a shell command line parameter. Writing python code to call out to a system() command where one of the arguments to that command is a python script snippet is going to be just as annoying as writing it in bash. > > +echo > > +echo "=== Demonstrate parallel writers ===" > > +echo > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > > + "arguments":{"addr":{"type":"unix", > > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > > + "name":"", "multi-conn":"on", "writable":true}}' "return" > > + > > +nbdsh -c ' > > +import os > > +sock = os.getenv("nbd_unix_socket") > > +h = [] > > + > > +for i in range(3): > > + h.append(nbd.NBD()) > > + h[i].connect_unix(sock) > > + assert h[i].can_multi_conn() > > > > This is somewhat C in python, maybe: > > handles = [nbd.NBD() for _ in range(3)] > > for h in handles: > h.connect_unix(sock) > assert h.can_multi_con() > > Since assert will abort the test, and we don't handle > exceptions, failure will not shutdown the connections > but it should not matter for the purpose of a test. As I said, I'm open to python suggestions :) I like your approach. > > > > + > > +buf1 = h[0].pread(1024 * 1024, 0) > > +if buf1 != b"\x01" * 1024 * 1024: > > + print("Unexpected initial read") > > > > Not clear when we initialize the buffer to \x01 - is this the qemu-io above? Yes, when the qcow2 file was initially created. > > > > +buf2 = b"\x03" * 1024 * 1024 > > +h[1].pwrite(buf2, 0) > > +h[2].flush() > > +buf1 = h[0].pread(1024 * 1024, 0) > > +if buf1 == buf2: > > + print("Flush appears to be consistent across connections") > > > > buf1 was the initial content, buf2 is the new content, but now we override > but1 to check that the right was flushed. It will be be better to use > different > names like inittial_data, updated_data, current_data. Can do. > > This block is the most important part of the test, showing that we can write > in one connection, flush in the second, and read the written block in the > third. > Maybe add a comment about this? I think it will help someone else trying > to understand what this part is trying to test. Can do. > > +{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > > + "name":"", "multi-conn":"on", "writable":true}} > > +{"return": {}} > > +Flush appears to be consistent across connections > > +{"execute":"block-export-del", > > + "arguments":{"id":"w"}} > > +{"return": {}} > > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} > > +{"execute":"nbd-server-stop"} > > +{"return": {}} > > +{"execute":"quit"} > > +{"return": {}} > > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} > > > > Nothing about the contents here says anything about the actual test > except the "Flush appears..." line. Yeah, it's a lot of QMP debugging (to show what commands were run in setting up the server), and less verbose in the nbdsh side. Do I need to add more output during the nbdsh that uses multiple connections? > > > > +*** done > > -- > > 2.35.1 > > > > Looks good, feel free to ignore the style comments and suggestion to rewrite > the test in python. The style comments are nice, the rewriting is going to be harder for me (but I'll accept help). At any rate, getting qemu-nbd to be feature-compatible may require a v3 anyway. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org