bdrv_flush() uses a loop like

    while (rwco.ret == NOT_DONE) {
        aio_poll(aio_context, true);
    }

to wait for thread pool, which may not get notified about the scheduled
BH right away, if there is no new event that wakes up a blocking
qemu_poll_ns(). In this case, it may even be a permanent hang.

Wake the main thread up by writing to the event notifier fd.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Christian Borntraeger <borntrae...@de.ibm.com>
Signed-off-by: Fam Zheng <f...@redhat.com>

---

I suspect this may relate to

[Qemu-devel] "iothread: release iothread around aio_poll" causes random
hangs at startup

[http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg00623.html]

reported by Christian Borntraeger. Because in iothread there is rarely
any fd activity, so the blocking aio_poll() may block forever if it
misses the BH schedule.

Christian, could you test this patch against your reproducer?
---
 thread-pool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/thread-pool.c b/thread-pool.c
index ac909f4..9b9c065 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -112,6 +112,7 @@ static void *worker_thread(void *opaque)
         qemu_mutex_lock(&pool->lock);
 
         qemu_bh_schedule(pool->completion_bh);
+        aio_notify(pool->ctx);
     }
 
     pool->cur_threads--;
-- 
2.4.3


Reply via email to