On Wed, Feb 03, 2021 at 05:44:34PM +0300, Vladimir Sementsov-Ogievskiy wrote: > 03.02.2021 17:21, Roman Kagan wrote: > > On Wed, Feb 03, 2021 at 04:10:41PM +0300, Vladimir Sementsov-Ogievskiy > > wrote: > > > 03.02.2021 13:53, Roman Kagan wrote: > > > > On Thu, Sep 03, 2020 at 10:02:58PM +0300, Vladimir Sementsov-Ogievskiy > > > > wrote: > > > > > We pause reconnect process during drained section. So, if we have some > > > > > requests, waiting for reconnect we should cancel them, otherwise they > > > > > deadlock the drained section. > > > > > > > > > > How to reproduce: > > > > > > > > > > 1. Create an image: > > > > > qemu-img create -f qcow2 xx 100M > > > > > > > > > > 2. Start NBD server: > > > > > qemu-nbd xx > > > > > > > > > > 3. Start vm with second nbd disk on node2, like this: > > > > > > > > > > ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ > > > > > file=/work/images/cent7.qcow2 -drive \ > > > > > > > > > > driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 > > > > > \ > > > > > -vnc :0 -m 2G -enable-kvm -vga std > > > > > > > > > > 4. Access the vm through vnc (or some other way?), and check that NBD > > > > > drive works: > > > > > > > > > > dd if=/dev/sdb of=/dev/null bs=1M count=10 > > > > > > > > > > - the command should succeed. > > > > > > > > > > 5. Now, kill the nbd server, and run dd in the guest again: > > > > > > > > > > dd if=/dev/sdb of=/dev/null bs=1M count=10 > > > > > > > > > > Now Qemu is trying to reconnect, and dd-generated requests are waiting > > > > > for the connection (they will wait up to 60 seconds (see > > > > > reconnect-delay option above) and than fail). But suddenly, vm may > > > > > totally hang in the deadlock. You may need to increase reconnect-delay > > > > > period to catch the dead-lock. > > > > > > > > > > VM doesn't respond because drain dead-lock happens in cpu thread with > > > > > global mutex taken. That's not good thing by itself and is not fixed > > > > > by this commit (true way is using iothreads). Still this commit fixes > > > > > drain dead-lock itself. > > > > > > > > > > Note: probably, we can instead continue to reconnect during drained > > > > > section. To achieve this, we may move negotiation to the connect > > > > > thread > > > > > to make it independent of bs aio context. But expanding drained > > > > > section > > > > > doesn't seem good anyway. So, let's now fix the bug the simplest way. > > > > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > > > > > --- > > > > > block/nbd.c | 5 +++++ > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > diff --git a/block/nbd.c b/block/nbd.c > > > > > index 9daf003bea..912ea27be7 100644 > > > > > --- a/block/nbd.c > > > > > +++ b/block/nbd.c > > > > > @@ -242,6 +242,11 @@ static void coroutine_fn > > > > > nbd_client_co_drain_begin(BlockDriverState *bs) > > > > > } > > > > > nbd_co_establish_connection_cancel(bs, false); > > > > > + > > > > > + if (s->state == NBD_CLIENT_CONNECTING_WAIT) { > > > > > + s->state = NBD_CLIENT_CONNECTING_NOWAIT; > > > > > + qemu_co_queue_restart_all(&s->free_sema); > > > > > + } > > > > > } > > > > > static void coroutine_fn nbd_client_co_drain_end(BlockDriverState > > > > > *bs) > > > > > > > > This basically defeats the whole purpose of reconnect: if the nbd client > > > > is trying to reconnect, drain effectively cancels that and makes all > > > > in-flight requests to complete with an error. > > > > > > > > I'm not suggesting to revert this patch (it's now in the tree as commit > > > > 8c517de24a), because the deadlock is no better, but I'm afraid the only > > > > real fix is to implement reconnect during the drain section. I'm still > > > > trying to get my head around it so no patch yet, but I just wanted to > > > > bring this up in case anybody beats me to it. > > > > > > > > > > > > > What do you mean by "reconnect during drained section"? Trying to > > > establish the connection? Or keeping in-flight requests instead of > > > cancelling them? We can't keep in-flight requests during drained > > > section, as it's the whole sense of drained-section: no in-flight > > > requests. So we'll have to wait for them at drain_begin (waiting up to > > > reconnect-delay, which doesn't seem good and triggers dead-lock for > > > non-iothread environment) or just cancel them.. > > > > > > Do you argue that waiting on drained-begin is somehow better than > > > cancelling? > > > > Sorry I should have used more accurate wording to be clear. > > > > Yes, my point is that canceling the requests on entry to a drained > > section is incorrect. And no, it doesn't matter if it can be long: > > that's the price you pay for doing the drain. Think of reconnect as a > > special case of a slow connection: if an nbd reply from the server is > > delayed for whatever reason without dropping the connection, you have to > > wait here, too. (In fact, reconnect is even slightly better here since > > it has a configurable finite timeout while the message delay does not > > AFAIK.) > > > > Does it make sense? > > Hmm, yes.. > > But then we should fix the original deadlock some other way.
Exactly. > Probably it will be possible only by using iothread for nbd node (I > failed to find original thread where someone said that iothreads is a > solution). And than we can do cancel in nbd_client_co_drain_begin() > only if bs doesn't have a separate iothread. Without this commit, I see qemu hang with nbd in an iothread, too. I'll double-check if it's that very same deadlock or something else. Thanks, Roman.