15.03.2021 09:06, Roman Kagan wrote:
The reconnection logic doesn't need to stop while in a drained section. Moreover it has to be active during the drained section, as the requests that were caught in-flight with the connection to the server broken can only usefully get drained if the connection is restored. Otherwise such requests can only either stall resulting in a deadlock (before 8c517de24a), or be aborted defeating the purpose of the reconnection machinery (after 8c517de24a).This series aims to just stop messing with the drained section in the reconnection code. While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict connection_co reentrance"); as I've missed the point of that commit I'd appreciate more scrutiny in this area.
The actual point is: connection_co (together with all functions called from it) has a lot of yield points. And we can't just enter the coroutine in any of the when we want, as it may break some BH which is actually waited for in this yield point.. Still, we should care only about yield points possible during drained section, so we don't need to care about direct qemu_coroutine_yield() inside nbd_connection_entry(). Many things changed since 5ad81b4946.. So probably, now all the (possible during drained section) yield points in nbd_connection_entry support reentering. But some analysis of possible yield points should be done.
Roman Kagan (7): block/nbd: avoid touching freed connect_thread block/nbd: use uniformly nbd_client_connecting_wait block/nbd: assert attach/detach runs in the proper context block/nbd: transfer reconnection stuff across aio_context switch block/nbd: better document a case in nbd_co_establish_connection block/nbd: decouple reconnect from drain block/nbd: stop manipulating in_flight counter block/nbd.c | 191 +++++++++++++++++++++++---------------------------- nbd/client.c | 2 - 2 files changed, 86 insertions(+), 107 deletions(-)
-- Best regards, Vladimir
