On 4/16/22 13:54, Vladimir Sementsov-Ogievskiy wrote:
@@ -187,9 +187,6 @@ static void reconnect_delay_timer_cb(void *opaque)
if (qatomic_load_acquire(&s->state) ==
NBD_CLIENT_CONNECTING_WAIT) {
s->state = NBD_CLIENT_CONNECTING_NOWAIT;
nbd_co_establish_connection_cancel(s->conn);
- while (qemu_co_enter_next(&s->free_sema, NULL)) {
- /* Resume all queued requests */
- }
}
reconnect_delay_timer_del(s);
Seems, removing the timer is not needed here too. We do
nbd_co_establish_connection_cancel(), and timer will be removed in
nbd_reconnect_attempt anyway.
More over, we don't need to keep timer in the state at all: it should
become local thing for nbd_reconnect_attempt. A kind of call
nbd_co_do_establish_connection() with timeout. That could be refactored
later, using same way as in 4-5 patches of my "[PATCH v4 0/7]
copy-before-write: on-cbw-error and cbw-timeout" series.
Yes, good point.
nbd_reconnect_attempt(BDRVNBDState *s)
s->ioc = NULL;
}
- nbd_co_do_establish_connection(s->bs, NULL);
+ qemu_co_mutex_unlock(&s->send_mutex);
+ nbd_co_do_establish_connection(s->bs, blocking, NULL);
+ qemu_co_mutex_lock(&s->send_mutex);
Hmm. So, that breaks a critical section.
We can do it because in_flight=1 and we are not connected => all other
requests will wait in the queue.
Still. during nbd_co_do_establish_connection() state may become
CONNECTED. That theoretically means that parallel requests may start.
Is it bad? Seems not.. Bad thing that comes into my mind is that
parallel request fails, and go to reconnect, and start own timer, but we
remove it now after locking the mutex back. But that's not possible as
parallel request should wait for in-flight=0 to start own
reconnect-attempt. And that is not possible, as we keep our in-flight
point.
Yes that was even intended. :> Synchronizing on in_flight == 0 before
reconnecting is the important bit that simplifies a lot of things.
@@ -2049,7 +2046,6 @@ static void
nbd_cancel_in_flight(BlockDriverState *bs)
if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
s->state = NBD_CLIENT_CONNECTING_NOWAIT;
- qemu_co_queue_restart_all(&s->free_sema);
As I understand, we always have one request running (or no requests at
all, but that's OK too) and it will wake all others (altogether, or they
will wake each other in a chain). So, we don't need to wake them here.
Exactly, the "let each coroutine wake up the next one" pattern can be
generalized to the error cases because the wakeups cascade until
in_flight becomes 0.
Paolo
}
nbd_co_establish_connection_cancel(s->conn);