On Thu, Nov 17, 2016 at 08:15:11AM -0600, Karl Rister wrote: > I think these results look a bit more in line with expectations on the > quick sniff test: > > QEMU_AIO_POLL_MAX_NS IOPs > unset 26,299 > 1 25,929 > 2 25,753 > 4 27,214 > 8 27,053 > 16 26,861 > 32 24,752 > 64 25,058 > 128 24,732 > 256 25,560 > 512 24,614 > 1,024 25,186 > 2,048 25,829 > 4,096 25,671 > 8,192 27,896 > 16,384 38,086 > 32,768 35,493 > 65,536 38,496 > 131,072 38,296 > > I did a spot check of CPU utilization when the polling started having > benefits. > > Without polling (QEMU_AIO_POLL_MAX_NS=unset) the iothread's CPU usage > looked like this: > > user time: 25.94% > system time: 22.11% > > With polling and QEMU_AIO_POLL_MAX_NS=16384 the iothread's CPU usage > looked like this: > > user time: 78.92% > system time: 20.80%
Excellent, now there are two optimizations remaining that could be useful: Christian suggested disabling virtqueue notifications while polling. This will reduce vmexits and avoid useless ioeventfd activity after we've already polled. Paolo suggested skipping the ppoll(2) or epoll_wait(2) call if polling made progress. These will be in v3. Your results prove that the virtqueue kick is slow. (I think the Linux AIO completion isn't the bottleneck but we also poll for that.) I'm still hesitant about adding polling to QEMU because tuning QEMU_AIO_POLL_MAX_NS= is difficult. Benchmarks will achieve higher numbers but actual users will benefit less. Is it time to drill down on why the virtqueue kick + ioeventfd mechanism is so slow? Polling achieved >40% IOPS improvements and I wonder where that time is lost with ioeventfd. Stefan
signature.asc
Description: PGP signature