On Fri, Aug 1, 2014 at 12:30 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Il 31/07/2014 18:13, Ming Lei ha scritto: >> Follows 'perf report' result on cycles event for with/without bypass >> coroutine: >> >> http://pastebin.com/ae0vnQ6V >> >> From the profiling result, looks bdrv_co_do_preadv() is a bit slow >> without bypass coroutine. > > Yeah, I can count at least 3.3% time spent here: > > 0.87% bdrv_co_do_preadv > 0.79% bdrv_aligned_preadv > 0.71% qemu_coroutine_switch > 0.52% tracked_request_begin > 0.45% coroutine_swap > > Another ~3% wasted in malloc, etc.
That should be related with coroutine and the BH in bdrv_co_do_rw(). In this post I didn't apply Stephan's coroutine resize patch which might decrease usage of malloc() for coroutine. At least, coroutine isn't cheap from the profile result. > > I suggest that we discuss it on the phone at next Tuesday's KVM call. No problem, and we can continue to discuss it on mail list too. If you and someone else need more profiling data, please feel free to let me know, and I am happy to provide that. > I'm not denying that bypassing coroutines is a useful thing to do. But > *everyone* should be doing that if it is so useful, and it's hard to do > it without causing code duplication. Yes, I agree, and the patch is designed as so or sort of. If device can know in advance that corotine isn't required, it just calls qemu_aio_set_bypass_co() to notify block layer to not use coroutine, that is the approach designed in this patch. Thanks,