On Fri, Aug 01, 2014 at 10:54:02AM +0800, Ming Lei wrote: > On Fri, Aug 1, 2014 at 12:30 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > > Il 31/07/2014 18:13, Ming Lei ha scritto: > >> Follows 'perf report' result on cycles event for with/without bypass > >> coroutine: > >> > >> http://pastebin.com/ae0vnQ6V > >> > >> From the profiling result, looks bdrv_co_do_preadv() is a bit slow > >> without bypass coroutine. > > > > Yeah, I can count at least 3.3% time spent here: > > > > 0.87% bdrv_co_do_preadv > > 0.79% bdrv_aligned_preadv > > 0.71% qemu_coroutine_switch > > 0.52% tracked_request_begin > > 0.45% coroutine_swap > > > > Another ~3% wasted in malloc, etc. > > That should be related with coroutine and the BH in bdrv_co_do_rw(). > In this post I didn't apply Stephan's coroutine resize patch which might > decrease usage of malloc() for coroutine.
Please rerun with "[PATCH v3 0/2] coroutine: dynamically scale pool size". > At least, coroutine isn't cheap from the profile result. Instead of bypassing coroutines we should first understand the overhead that they impose. Is it due to the coroutine implementation (switching stacks) or due to the bdrv_co_*() code that happens to use coroutines but slow for other reasons.
pgpLT0E6T5BUR.pgp
Description: PGP signature