On Thursday, November 10, 2016 11:17:35 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > I think the question is not "is there any polling to be done" but rather > > "is there anything that requires looking at a file descriptor". If you > > have e.g. an NBD device on the AioContext you cannot poll. On the other > > hand if all you have is bottom halves (which you can poll with > > ctx->notified), AIO and virtio ioeventfds, you can poll. > > This is a good point. Polling should only be done if all resources in > the AioContext benefit from polling - otherwise it adds latency to > resources that don't support polling. > > Another thing: only poll if there is work to be done. Linux AIO must > only poll the ring when there are >0 requests outstanding. Currently it > always polls (doh!).
Good idea. So the result of polling callback could be one of ready, not ready and not active? Or did you have in mind something else? > > In particular, testing for bottom halves is necessary to avoid incurring > > extra latency on flushes, which use the thread pool. > > The current code uses a half-solution: it uses aio_compute_timeout() to > see if any existing BHs are ready to execute *before* beginning to poll. > > Really we should poll BHs since they can be scheduled during the polling > loop. We should do so for correctness (hopefully with just ctx->notified: there should be no need to walk the BH list during polling). However, the user of the BH should activate polling "manually" by registering its own polling handler: if there are no active polling handlers, just ignore bottom halves and do the poll(). This is because there are always a handful of registered bottom halves, but they are not necessarily "activatable" from other threads. For example the thread pool always has one BH but as you noticed for Linux AIO, it may not have any pending requests. So the thread pool would still have to register with aio_set_poll_handler, even if it uses bottom halves internally for the signaling. I guess it would not need to register an associated IOHandler, since it can just use aio_bh_poll. A couple more random observations: - you can pass the output of aio_compute_timeout(ctx) to run_poll_handlers, like MIN((uint64_t)aio_compute_timeout(ctx), (uint64_t)aio_poll_max_ns). - since we know that all resources are pollable, we don't need to poll() at all if polling succeeds (though we do need aio_notify_accept()+aio_bh_poll()). > > Perhaps the poll handler could be a parameter to aio_set_event_notifier? > > run_poll_handlers can just set revents (to G_IO_IN for example) if the > > polling handler returns true, and return true as well. aio_poll can > > then call aio_notify_accept and aio_dispatch, bypassing the poll system > > call altogether. > > This is problematic. The poll source != file descriptor so there is a > race condition: > > 1. Guest increments virtqueue avail.idx > 2. QEMU poll notices avail.idx update and marks fd.revents readable. > 3. QEMU dispatches fd handler: > 4. Guest kicks virtqueue -> ioeventfd is signalled > > Unfortunately polling is "too fast" and event_notifier_test_and_clear() > returns false; we won't process the virtqueue! > > Pretending that polling is the same as fd monitoring only works when #4 > happens before #3. We have to solve this race condition. > > The simplest solution is to get rid of the if statement (i.e. enable > spurious event processing). Not sure if that has a drawback though. > Do you have a nicer solution in mind? No, I don't. Removing the if seems sensible, but I like the polling handler more now that I know why it's there. The event_notifier_test_and_clear does add a small latency. On one hand, because you need to check if *all* "resources" support polling, you need a common definition of "resource" (e.g. aio_set_fd_handler). But on the other hand it would be nice to have a streamlined polling callback. I guess you could add something like aio_set_handler that takes a struct with all interesting callbacks: - in/out callbacks (for aio_set_fd_handler) - polling handler - polling callback Then there would be simplified interfaces on top, such as aio_set_fd_handler, aio_set_event_notifier and your own aio_set_poll_handler. Paolo