[Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency

Roman Pen Tue, 19 Jul 2016 03:28:39 -0700

This series are intended to reduce completion latencies by two changes:

1. QEMU does not use any timeout value for harvesting completed AIO
   requests from the ring buffer, thus io_getevents() can be implemented
   in userspace (first patch).


2. In order to reduce completion latency it makes sense to harvest completed
   requests ASAP.  Very fast backend device can complete requests just after
   submission, so it is worth trying to check ring buffer and peek completed
   requests directly after io_submit() has been called (third patch).

Indeed, the series reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the third patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After the third patch is applied the distribution of number of completed
requests is "smoother" and the queue (requests in-flight) is almost never
full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, 
stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, 
stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.

Roman Pen (3):
  linux-aio: consume events in userspace instead of calling io_getevents
  linux-aio: split processing events function
  linux-aio: process completions from ioq_submit()

 block/linux-aio.c | 175 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 137 insertions(+), 38 deletions(-)

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: qemu-devel@nongnu.org
-- 
2.8.2

[Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency

Reply via email to