On Sun, Aug 23, 2015 at 18:04:46 -0700, Paolo Bonzini wrote: > On 23/08/2015 17:23, Emilio G. Cota wrote: > > On some parallel workloads this gives up to a 15% speed improvement. > > > > Signed-off-by: Emilio G. Cota <c...@braap.org> > > --- > > include/qemu/thread-posix.h | 47 ++++++++++++++++++++++++++++++++++++++++++ > > include/qemu/thread.h | 6 ------ > > util/qemu-thread-posix.c | 50 > > +++++---------------------------------------- > > 3 files changed, 52 insertions(+), 51 deletions(-) (snip) > Applied, but in the end the spinlock will probably simply use a simple > test-and-test-and-set lock, or an MCS lock. There is no need to use > pthreads for this.
Agreed. In fact in my tests I sometimes use concurrencykit [http://concurrencykit.org/] to test lock alternatives (would love to be able to just add ck as a submodule of qemu, but they do not support as many architectures as qemu does). Note that fair locks (such as MCS) for user-space programs are not necessarily a good idea when preemption is considered--and for usermode we'd be forced (if we allowed MCS's to nest) to use per-lock stack variables given that the number of threads is unbounded, which is pretty ugly. If contention is a problem, a simple, fast spinlock combined with an exponential backoff is already pretty good. Fairness is not a requirement (the cache substrate of a NUMA machine isn't necessarily fair, is it?); scalability is. If the algorithm in the guest requires fairness, the guest must use a fair lock (e.g. MCS), and that works as intended when run natively or under qemu. I just tested a fetch-and-swap+exp.backoff spinlock with usermode on a program that spawns N threads and each thread performs an 2**M atomic increments on the same variable. That is, a degenerate worst-case kind of contention. N varies from 1 to 64, and M=15 on all runs, 5 runs per experiment: http://imgur.com/XpYctyT With backoff, the per-access latency grows roughly linearly with the number of cores, i.e. this is scalable. The other two are clearly superlinear. The fastest spinlock as per ck's documentation (for uncontended cases) is the fetch-and-swap lock. I just re-ran the usermode experiments from yesterday with fas and fas+exp.backoff: http://imgur.com/OK2WZg8 There really isn't much difference among the three candidates. In light of these results I see very little against going for a solution with exponential backoff. Thanks, Emilio