On Mon, 11/14 16:29, Paolo Bonzini wrote: > > > On 14/11/2016 16:26, Stefan Hajnoczi wrote: > > On Fri, Nov 11, 2016 at 01:59:25PM -0600, Karl Rister wrote: > >> QEMU_AIO_POLL_MAX_NS IOPs > >> unset 31,383 > >> 1 46,860 > >> 2 46,440 > >> 4 35,246 > >> 8 34,973 > >> 16 46,794 > >> 32 46,729 > >> 64 35,520 > >> 128 45,902 > > > > The environment variable is in nanoseconds. The range of values you > > tried are very small (all <1 usec). It would be interesting to try > > larger values in the ballpark of the latencies you have traced. For > > example 2000, 4000, 8000, 16000, and 32000 ns. > > > > Very interesting that QEMU_AIO_POLL_MAX_NS=1 performs so well without > > much CPU overhead. > > That basically means "avoid a syscall if you already know there's > something to do", so in retrospect it's not that surprising. Still > interesting though, and it means that the feature is useful even if you > don't have CPU to waste.
With the "deleted" bug fixed I did a little more testing to understand this. Setting QEMU_AIO_POLL_MAX_NS=1 doesn't mean run_poll_handlers() will only loop for 1 ns - the patch only checks at every 1024 polls. The first poll in a run_poll_handlers() call can hardly succeed, so we poll at least 1024 times. According to my test, on average each run_poll_handlers() takes ~12000ns, which is ~160 iterations of the poll loop, before geting a new event (either from virtio queue or linux-aio, I don't have the ratio here). So in the worse case (no new event), 1024 iterations is basically (12000 / 160 * 1024) = 76800 ns! The above is with iodepth=1 and jobs=1. With iodepth=32 and jobs=1, or iodepth=8 and jobs=4, the numbers are ~30th poll with 5600ns. Fam