nbd: decouple reconnect from drain

Vladimir Sementsov-Ogievskiy Mon, 15 Mar 2021 13:10:42 -0700

15.03.2021 09:06, Roman Kagan wrote:

The reconnection logic doesn't need to stop while in a drained section.
Moreover it has to be active during the drained section, as the requests
that were caught in-flight with the connection to the server broken can
only usefully get drained if the connection is restored.  Otherwise such
requests can only either stall resulting in a deadlock (before
8c517de24a), or be aborted defeating the purpose of the reconnection
machinery (after 8c517de24a).


Since the pieces of the reconnection logic are now properly migrated
from one aio_context to another, it appears safe to just stop messing
with the drained section in the reconnection code.

Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")


I'd not think that it "fixes" it. Behavior changes.. But 5ad81b4946 didn't 
introduce any bugs.

Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd 
reconnect-delay")


And here..

1. There is an existing problem (unrelated to nbd) in Qemu that long io request 
which we have to wait for at drained_begin may trigger a dead lock 
(https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html)

2. So, when we have nbd reconnect (and therefore long io requests) we simply 
trigger this deadlock.. That's why I decided to cancel the requests (assuming 
they will most probably fail anyway).

I agree that nbd driver is wrong place for fixing the problem described in 
(https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html), but if 
you just revert 8c517de24a, you'll see the deadlock again..



--
Best regards,
Vladimir

Re: [PATCH 6/7] block/nbd: decouple reconnect from drain

Reply via email to