On Thu, Aug 7, 2014 at 7:06 PM, Kevin Wolf <kw...@redhat.com> wrote: > Am 07.08.2014 um 12:52 hat Ming Lei geschrieben: >> On Thu, Aug 7, 2014 at 6:27 PM, Ming Lei <ming....@canonical.com> wrote: >> > On Wed, Aug 6, 2014 at 11:40 PM, Kevin Wolf <kw...@redhat.com> wrote: >> >> > Also there are some problems with your patches which can't boot a >> > VM in my environment: >> > >> > - __thread patch: looks there is no '__thread' used, and the patch >> > basically makes bypass not workable. >> > >> > - bdrv_co_writev callback isn't set for raw-posix, looks my rootfs need to >> > write during booting >> > >> > - another problem, I am investigating: laio isn't accessable >> > in qemu_laio_process_completion() sometimes >> >> This one should be caused by accessing 'laiocb' after cb(). > > I stumbled across the same problems this morning when I tried to > actually run VMs with it instead of just qemu-img bench. They should all > be fixed in my git repo now. (Haven't figured out yet why __thread > doesn't work, so I have reverted that part, probably at the cost of some > performance.)
In my test, looks no obvious performance effect by the commit, or by pthread_getspecific() which should be fine for fast path. I also simply revert it since __thread can't be added. Interesting, my other local change is basically same with your change, :-) Finally I implemented bypassing coroutine on your linux-aio coro patches, for comparing bypass effect easily, now both are run in basically same path except for coroutine APIs: git://kernel.ubuntu.com/ming/qemu.git v2.1.0-mq.1-kevin-perf The above branch only holds three patches which are against the latest 'perf-bypass' branch of your tree. Then I run it in VM on my server and still use the same fio(linux aio, direct, 4k bs, 120sec) to test virtio-blk dataplane performance, and the virtio-blk is backed by the /dev/nullb0 block device too. ----------------------------------------------------+ ----------------------------------------- without bypass(linux-aio coro) | with bypass linux-aio corou ---------------------------------------------------+------------------------------------------ 1 vq, 2 jobs | 101K iops | 116K iops ------------------------------------------------------------------------------------------------ 4 vq, 4 jobs | 121K iops | 142K iops ------------------------------------------------------------------------------------------------ Looks there is still some difference even applying linux-aio coroutine patches. Now I am a bit more confident that coroutine is the cause of performance difference... Thanks,