Am 28.11.2014 um 11:28 schrieb Paolo Bonzini: > > On 28/11/2014 09:13, Peter Lieven wrote: >> Am 27.11.2014 um 17:40 schrieb Paolo Bonzini: >>> On 27/11/2014 11:27, Peter Lieven wrote: >>>> +static __thread struct CoRoutinePool { >>>> + Coroutine *ptrs[POOL_MAX_SIZE]; >>>> + unsigned int size; >>>> + unsigned int nextfree; >>>> +} CoPool; >>>> >>> The per-thread ring unfortunately didn't work well last time it was >>> tested. Devices that do not use ioeventfd (not just the slow ones, even >>> decently performing ones like ahci, nvme or megasas) will create the >>> coroutine in the VCPU thread, and destroy it in the iothread. The >>> result is that coroutines cannot be reused. >>> >>> Can you check if this is still the case? >> I already tested at least for IDE and for ioeventfd=off. The coroutine >> is created in the vCPU thread and destroyed in the I/O thread. >> >> I also havea more complicated version which sets per therad coroutine pool >> only >> for dataplane. Avoiding the lock for dedicated iothreads. >> >> For those who want to take a look: >> >> https://github.com/plieven/qemu/commit/325bc4ef5c7039337fa785744b145e2bdbb7b62e > Can you test it against the patch I just sent in Kevin's linux-aio > coroutine thread?
Was already doing it ;-) At least with test-couroutine.c.... master: Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns per coroutine paolo: Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns per coroutine plieven/perf_master2: Run operation 40000000 iterations 9.013785 s, 4437K operations/s, 225ns per coroutine plieven/perf_master: Run operation 40000000 iterations 11.072883 s, 3612K operations/s, 276ns per coroutine However, perf_master and perf_master2 have a regerssion regarding nesting as it seems. @Kevin: Could that be the reason why they performe bad in some szenarios? Regarding the bypass that is discussed. If it is not just a benchmark thing but really necessary for some peoples use cases why not add a new aio mode like "bypass" and use it only then. If the performance is really needed the user he/she might trade it in for lost features like iothrottling, filters etc. Peter