On Jun 27, 2016 6:37 PM, "Stefan Hajnoczi" <stefa...@redhat.com> wrote: > > Commit ccb9dc10129954d0bcd7814298ed445e684d5a2a ("linux-aio: Cancel BH > if not needed") exposed a side-effect of scheduling the BH for nested > event loops. When io_getevents(2) fetches MAX_EVENTS events then it's > necessary to call it again. Failure to do so can lead to hung I/O > because the eventfd has already been cleared and the completion BH will > not run again. > > This patch fixes the hang by calling io_getevents(2) again if the events > array was totally full. > > Reported-by: Roman Penyaev <roman.peny...@profitbricks.com> > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > --- > block/linux-aio.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/block/linux-aio.c b/block/linux-aio.c > index e468960..af15f85 100644 > --- a/block/linux-aio.c > +++ b/block/linux-aio.c > @@ -117,6 +117,7 @@ static void qemu_laio_completion_bh(void *opaque) > LinuxAioState *s = opaque; > > /* Fetch more completion events when empty */ > +more_events: > if (s->event_idx == s->event_max) { > do { > struct timespec ts = { 0 }; > @@ -146,6 +147,10 @@ static void qemu_laio_completion_bh(void *opaque) > qemu_laio_process_completion(laiocb); > } > > + if (s->event_idx == MAX_EVENTS) { > + goto more_events; /* there might still be events waiting for us */ > + } > +
I am on vacation and can't check anything, but seems you also could reproduce an issue and seems you are right: what I've also noticed is that io hangs only if total number of requests in guest > than 128 (what is defined in linux-aio.c). But I do not understand why you have to repeat io_getevents in case of return value == MAX_EVENTS. According to my understanding you cant submit more than 128, so the first io_getevents call returns you exactly what you've submitted. -- Roman > if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) { > ioq_submit(s); > } > -- > 2.7.4 >