05.02.2019 19:35, Vladimir Sementsov-Ogievskiy wrote: > 16.01.2019 19:58, Daniel P. Berrangé wrote: >> On Wed, Jan 16, 2019 at 10:25:03AM -0600, Eric Blake wrote: >>> [adding Dan] >>> >>> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote: >>>> To implement reconnect we need several states for the client: >>>> CONNECTED, QUIT and two CONNECTING states. CONNECTING states will >>>> be realized in the following patches. This patch implements CONNECTED >>>> and QUIT. >>>> >>>> QUIT means, that we should close the connection and fail all current >>>> and further requests (like old quit = true). >>>> >>>> CONNECTED means that connection is ok, we can send requests (like old >>>> quit = false). >>>> >>>> For receiving loop we use a comparison of the current state with QUIT, >>>> because reconnect will be in the same loop, so it should be looping >>>> until the end. >>>> >>>> Opposite, for requests we use a comparison of the current state with >>>> CONNECTED, as we don't want to send requests in CONNECTING states ( >>>> which are unreachable now, but will be reachable after the following >>>> commits) >>>> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>>> --- >>>> block/nbd-client.h | 9 ++++++++- >>>> block/nbd-client.c | 55 >>>> ++++++++++++++++++++++++++++++++---------------------- >>>> 2 files changed, 41 insertions(+), 23 deletions(-) >>> >>> Dan just recently proposed patches to SocketChardev in general to use a >>> state machine that distinguishes between connecting and connected: >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03339.html >>> >>> I'm wondering how much of his work is related or can be reused to get >>> restartable connections on NBD sockets? >> >> There's nothing really special about what I did. Vladimir looks to >> have basically done the same kind of approach, but I don't think >> there's real scope for sharing with chardevs, as each care about >> their own set of states. >> >>> Remember, right now, the NBD code always starts in blocking mode, and >>> does single-threaded handshaking until it is ready for transmission, >>> then switches to non-blocking mode for all subsequent transmissions (so, >>> for example, servicing a read request can assume that the socket is >>> valid without further waiting). But once we start allowing reconnects, >>> a read request will need to detect when one socket has gone down, and >>> wait for its replacement socket to come back up, in order to retry the >>> request; this retry is in a context where we are in non-blocking >>> context, but the retry must establish a new socket, and possibly convert >>> the socket into TLS mode, all before being ready to retry the read request. >> >> That makes it sound like the NBD handshake needs to be converted to >> use entirely non-blocking I/O. >> >> The TLS handshake already uses an asynchronous callback pattern and >> to deal with that NBD had to create & run a private main loop to >> complete the TLS handshake in its blocking code pattern. >> >> You could potentially push this concept up to the top level. ie >> implement the entire NBD handshake with async callbacks / non-blocking >> I/O. Then simply use a private main loop to run that in a blocking >> fashion for the initial connection. When you need to do re-connect >> you now just run the async code without the extra main loop around >> it. >> > > Hmm, you mean this code: > > data.loop = g_main_loop_new(g_main_context_default(), FALSE); > trace_nbd_receive_starttls_tls_handshake(); > qio_channel_tls_handshake(tioc, > nbd_tls_handshake, > &data, > NULL, > NULL); > > if (!data.complete) { > g_main_loop_run(data.loop); > } > g_main_loop_unref(data.loop); > > > What this does in context of Qemu? Isn't it more correct to do > coroutine based async staff, like in qcow2_open(): > > if (qemu_in_coroutine()) { > /* From bdrv_co_create. */ > qcow2_open_entry(&qoc); > } else { > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > qemu_coroutine_enter(qemu_coroutine_create(qcow2_open_entry, &qoc)); > BDRV_POLL_WHILE(bs, qoc.ret == -EINPROGRESS); > } > return qoc.ret; > > And then yield after handshake() and enter back from nbd_tls_handshake > callback? > > Hmm, also, checked, nobody calls g_main_context_default() in qemu, except > util/main-loop.c, nbd and tests. So, I'm not sure that this is a valid thing > to do in nbd.. > >
Aha, we just don't have any bs in nbd/ code. But anyway, moving to AioContext and make negotiation non-blocking should be good idea. I'll try to do something around it. -- Best regards, Vladimir