16.01.2019 19:58, Daniel P. Berrangé wrote: > On Wed, Jan 16, 2019 at 10:25:03AM -0600, Eric Blake wrote: >> [adding Dan] >> >> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote: >>> To implement reconnect we need several states for the client: >>> CONNECTED, QUIT and two CONNECTING states. CONNECTING states will >>> be realized in the following patches. This patch implements CONNECTED >>> and QUIT. >>> >>> QUIT means, that we should close the connection and fail all current >>> and further requests (like old quit = true). >>> >>> CONNECTED means that connection is ok, we can send requests (like old >>> quit = false). >>> >>> For receiving loop we use a comparison of the current state with QUIT, >>> because reconnect will be in the same loop, so it should be looping >>> until the end. >>> >>> Opposite, for requests we use a comparison of the current state with >>> CONNECTED, as we don't want to send requests in CONNECTING states ( >>> which are unreachable now, but will be reachable after the following >>> commits) >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>> --- >>> block/nbd-client.h | 9 ++++++++- >>> block/nbd-client.c | 55 >>> ++++++++++++++++++++++++++++++++---------------------- >>> 2 files changed, 41 insertions(+), 23 deletions(-) >> >> Dan just recently proposed patches to SocketChardev in general to use a >> state machine that distinguishes between connecting and connected: >> >> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03339.html >> >> I'm wondering how much of his work is related or can be reused to get >> restartable connections on NBD sockets? > > There's nothing really special about what I did. Vladimir looks to > have basically done the same kind of approach, but I don't think > there's real scope for sharing with chardevs, as each care about > their own set of states. > >> Remember, right now, the NBD code always starts in blocking mode, and >> does single-threaded handshaking until it is ready for transmission, >> then switches to non-blocking mode for all subsequent transmissions (so, >> for example, servicing a read request can assume that the socket is >> valid without further waiting). But once we start allowing reconnects, >> a read request will need to detect when one socket has gone down, and >> wait for its replacement socket to come back up, in order to retry the >> request; this retry is in a context where we are in non-blocking >> context, but the retry must establish a new socket, and possibly convert >> the socket into TLS mode, all before being ready to retry the read request. > > That makes it sound like the NBD handshake needs to be converted to > use entirely non-blocking I/O. > > The TLS handshake already uses an asynchronous callback pattern and > to deal with that NBD had to create & run a private main loop to > complete the TLS handshake in its blocking code pattern. > > You could potentially push this concept up to the top level. ie > implement the entire NBD handshake with async callbacks / non-blocking > I/O. Then simply use a private main loop to run that in a blocking > fashion for the initial connection. When you need to do re-connect > you now just run the async code without the extra main loop around > it. >
Hmm, you mean this code: data.loop = g_main_loop_new(g_main_context_default(), FALSE); trace_nbd_receive_starttls_tls_handshake(); qio_channel_tls_handshake(tioc, nbd_tls_handshake, &data, NULL, NULL); if (!data.complete) { g_main_loop_run(data.loop); } g_main_loop_unref(data.loop); What this does in context of Qemu? Isn't it more correct to do coroutine based async staff, like in qcow2_open(): if (qemu_in_coroutine()) { /* From bdrv_co_create. */ qcow2_open_entry(&qoc); } else { assert(qemu_get_current_aio_context() == qemu_get_aio_context()); qemu_coroutine_enter(qemu_coroutine_create(qcow2_open_entry, &qoc)); BDRV_POLL_WHILE(bs, qoc.ret == -EINPROGRESS); } return qoc.ret; And then yield after handshake() and enter back from nbd_tls_handshake callback? Hmm, also, checked, nobody calls g_main_context_default() in qemu, except util/main-loop.c, nbd and tests. So, I'm not sure that this is a valid thing to do in nbd.. -- Best regards, Vladimir