Hi Kevin, On Wed, Nov 26, 2014 at 10:46 PM, Kevin Wolf <kw...@redhat.com> wrote: > This improves the performance of requests because an ACB doesn't need to > be allocated on the heap any more. It also makes the code nicer and > smaller.
I am not sure it is good way for linux aio optimization: - for raw image with some constraint, coroutine can be avoided since io_submit() won't sleep most of times - handling one time coroutine takes much time than handling malloc, memset and free on small buffer, following the test data: -- 241ns per coroutine -- 61ns per (malloc, memset, free for 128bytes) I still think we should figure out a fast path to avoid cocourinte for linux-aio with raw image, otherwise it can't scale well for high IOPS device. Also we can use simple buf pool to avoid the dynamic allocation easily, can't we? > > As a side effect, the codepath taken by aio=threads is changed to use > paio_submit_co(). This doesn't change the performance at this point. > > Results of qemu-img bench -t none -c 10000000 [-n] /dev/loop0: > > | aio=native | aio=threads > | before | with patch | before | with patch > ------+----------+------------+----------+------------ > run 1 | 29.921s | 26.932s | 35.286s | 35.447s > run 2 | 29.793s | 26.252s | 35.276s | 35.111s > run 3 | 30.186s | 27.114s | 35.042s | 34.921s > run 4 | 30.425s | 26.600s | 35.169s | 34.968s > run 5 | 30.041s | 26.263s | 35.224s | 35.000s > > TODO: Do some more serious benchmarking in VMs with less variance. > Results of a quick fio run are vaguely positive. I will do the test with Paolo's fast path approach under VM I/O situation. Thanks, Ming Lei