On Wed, Nov 09, 2016 at 06:30:11PM +0100, Paolo Bonzini wrote: Thanks for the feedback. I hope that Karl will be able to find a QEMU_AIO_POLL_MAX_NS setting that improves the benchmark. At that point I'll send a new version of this series so we can iron out the details.
> > +static bool run_poll_handlers(AioContext *ctx) > > +{ > > + int64_t start_time; > > + unsigned int loop_count = 0; > > + bool fired = false; > > + > > + /* Is there any polling to be done? */ > > I think the question is not "is there any polling to be done" but rather > "is there anything that requires looking at a file descriptor". If you > have e.g. an NBD device on the AioContext you cannot poll. On the other > hand if all you have is bottom halves (which you can poll with > ctx->notified), AIO and virtio ioeventfds, you can poll. This is a good point. Polling should only be done if all resources in the AioContext benefit from polling - otherwise it adds latency to resources that don't support polling. Another thing: only poll if there is work to be done. Linux AIO must only poll the ring when there are >0 requests outstanding. Currently it always polls (doh!). > In particular, testing for bottom halves is necessary to avoid incurring > extra latency on flushes, which use the thread pool. The current code uses a half-solution: it uses aio_compute_timeout() to see if any existing BHs are ready to execute *before* beginning to poll. Really we should poll BHs since they can be scheduled during the polling loop. > Perhaps the poll handler could be a parameter to aio_set_event_notifier? > run_poll_handlers can just set revents (to G_IO_IN for example) if the > polling handler returns true, and return true as well. aio_poll can > then call aio_notify_accept and aio_dispatch, bypassing the poll system > call altogether. This is problematic. The poll source != file descriptor so there is a race condition: 1. Guest increments virtqueue avail.idx 2. QEMU poll notices avail.idx update and marks fd.revents readable. 3. QEMU dispatches fd handler: void virtio_queue_host_notifier_read(EventNotifier *n) { VirtQueue *vq = container_of(n, VirtQueue, host_notifier); if (event_notifier_test_and_clear(n)) { virtio_queue_notify_vq(vq); } } 4. Guest kicks virtqueue -> ioeventfd is signalled Unfortunately polling is "too fast" and event_notifier_test_and_clear() returns false; we won't process the virtqueue! Pretending that polling is the same as fd monitoring only works when #4 happens before #3. We have to solve this race condition. The simplest solution is to get rid of the if statement (i.e. enable spurious event processing). Not sure if that has a drawback though. Do you have a nicer solution in mind?
signature.asc
Description: PGP signature