On Tue, Feb 14, 2012 at 09:33, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Mon, Feb 13, 2012 at 04:11:15PM +0100, Alex Barcelo wrote: >> This new implementation... well, it seems to work (I have done an >> ubuntu installation with a cdrom and a qcow drive, which seems to use >> quite a lot of coroutines). Of course I have done the coroutine-test >> and it was OK. But... I wasn't confident enough to propose it as a >> "mature alternative". And I don't have any performance benchmark, >> which would be interesting. So, I thought that the better option would >> be to send this patch to the developers as an alternative to ucontext. > > As a starting point, I suggest looking at > test-coroutine.c:perf_lifecycle(). It's a simple create-and-then-enter > benchmark which measures the latency of doing this. I expect you will > find performance is identical to the ucontext version because the > coroutine should be pooled and created using sigaltstack only once. > > The interesting thing would be to benchmark ucontext coroutine creation > against sigaltstack. Even then it may not matter much as long as pooled > coroutines are used most of the time.
Didn't see the performance mode for test-coroutine. Now a benchmark test it's easy (it's half-done). The lifecycle is not a good benchmark, because sigaltstack is only called once. (As you said, the timing change in less than 1%). I thought that it would be interesting to add a performance test for nesting (which can be coroutine creation intensive). So I did it. I will send as a patch, is simple but it works for this. The preliminary results are: ucontext (traditional) method: MSG: Nesting 1000000 iterations of 100000 depth each: 0.452988 s sigaltstack (new) method: MSG: Nesting 1000000 iterations of 100000 depth each: 0.689649 s The sigaltstack is worse (well, it doesn't surprise me, it's more complicated and does more jumps and is a code flow more erratic). But a loss in efficiency in coroutines should not be important (how many coroutines are created in a typical qemu-system execution? I'm thinking "one"). Also as you said ;) pooled coroutines are used most of the time, in real qemu-system execution. tl,dr; the longjmps are the same and equally good (or bad) either way, so performance won't really aknowledge the change. Will it?