On Fri, Aug 1, 2014 at 9:13 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > On Fri, Aug 01, 2014 at 10:54:02AM +0800, Ming Lei wrote: >> On Fri, Aug 1, 2014 at 12:30 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: >> > Il 31/07/2014 18:13, Ming Lei ha scritto: >> >> Follows 'perf report' result on cycles event for with/without bypass >> >> coroutine: >> >> >> >> http://pastebin.com/ae0vnQ6V >> >> >> >> From the profiling result, looks bdrv_co_do_preadv() is a bit slow >> >> without bypass coroutine. >> > >> > Yeah, I can count at least 3.3% time spent here: >> > >> > 0.87% bdrv_co_do_preadv >> > 0.79% bdrv_aligned_preadv >> > 0.71% qemu_coroutine_switch >> > 0.52% tracked_request_begin >> > 0.45% coroutine_swap >> > >> > Another ~3% wasted in malloc, etc. >> >> That should be related with coroutine and the BH in bdrv_co_do_rw(). >> In this post I didn't apply Stephan's coroutine resize patch which might >> decrease usage of malloc() for coroutine. > > Please rerun with "[PATCH v3 0/2] coroutine: dynamically scale pool > size".
No problem, will do that. Actually in my last post with rfc, this patchset was against your coroutine resize patches. I will provide the profile data tomorrow. > >> At least, coroutine isn't cheap from the profile result. > > Instead of bypassing coroutines we should first understand the overhead > that they impose. Is it due to the coroutine implementation (switching > stacks) or due to the bdrv_co_*() code that happens to use coroutines > but slow for other reasons. >From the 3th patch(block: support to bypass qemu coroutinue) and the 5th patch(dataplane: enable selective bypassing coroutine), the change is to bypass coroutine and BH, and the other bdrv code path is same, so it is due to the coroutine implementation, IMO. Thanks,