On Wed, Aug 6, 2014 at 5:33 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > This can be used to compute the cost of coroutine operations. In the > end the cost of the function call is a few clock cycles, so it's pretty > cheap for now, but it may become more relevant as the coroutine code > is optimized. > > For example, here are the results on my machine: > > Function call 100000000 iterations: 0.173884 s > Yield 100000000 iterations: 8.445064 s > Lifecycle 1000000 iterations: 0.098445 s > Nesting 10000 iterations of 1000 depth each: 7.406431 s > > One yield takes 83 nanoseconds, one enter takes 97 nanoseconds, > one coroutine allocation takes (roughly, since some of the allocations > in the nesting test do hit the pool) 739 nanoseconds: > > (8.445064 - 0.173884) * 10^9 / 100000000 = 82.7 > (0.098445 * 100 - 0.173884) * 10^9 / 100000000 = 96.7 > (7.406431 * 10 - 0.173884) * 10^9 / 100000000 = 738.9
Thought about it further, the above is _not_ cheap if it is computed correctly, take block layer for example: - suppose one block device can reach 300K IOPS, like Kevin's loop over tmpfs file, so handling one IO takes 3.333us - for handling one IO, at least two enter and one yield are required in current implementation, so these three operations take 0.277us(0.083 + 0.097 * 2) for handling one IO(suppose all allocations hit the pool) - so coroutine-only operations take 8.31%(0.277/3.333), not mention effect from switching stack Some modern storage devices can handle millions of IOPS, so using coroutine surely will slow down these devices a lot. Thanks,