On Fri, Sep 4, 2015 at 1:29 AM, Dave Chinner <da...@fromorbit.com> wrote: > ... > 0.02 1c: pause > 4.45 mov %ecx,%eax > 0.00 lock cmpxchg %edx,(%rdi) > 95.18 test %eax,%eax > jne 1c ... > It looks like it's spending all it's time looping around the cmpxchg.
That code sequence doesn't look sensible. Busy-looping on a cmpxchg is insane - if you are busy-looping, you should always make sure the inner tight loop is done while waiting for the value. It seems to come from virt_queued_spin_lock(), and that just looks like completely bogus crap. PeterZ, this is your magical hypervisor thing, and I get the feeling that that explains why Dave sees nasty performance: most people have tested either on raw hardware or using the actual paravirtualized ones, but this is the case for "we're running with a hypervisor, but not paravirtualized". So virt_queued_spin_lock() for the hypervisor case looks completely buggered to me for several reasons: - it doesn't actually ever use any queueing, since it always returns true so the "queued spinlocks" in this case aren't actually queued, and they aren't even ticket-locks, they are just plain 0/1 values if I read things right. - the busy-loop to set the queued spinlock uses that cmpxchg in a tight loop, which kills any memory subsystem. That's unacceptable. So at the very *minimum*, that second issue should be fixed, and the loop in virt_queued_spin_lock() should look something like do { while (READ_ONCE(lock->val) != 0) cpu_relax(); } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0); which at least has a chance in hell of behaving well on the bus and in a HT environment. But I suspect that it would be even better for Dave to just disable the whole thing, and see how the queued locks actually work. Dave, can you turn that virt_queued_spin_lock() into just "return false"? In fact, I would almost _insist_ we do this when CONFIG_PARAVIRT_SPINLOCK isn't set, isn't that what our old ticket-spinlocks did? They didn't screw up and degrade to a test-and-set lock just because they saw a hypervisor - that only happened when things were paravirt-aware. No? Dave, if you have the energy, try it both ways. But the code as-is for "I'm running in a hypervisor" looks just terminally broken. People who didn't run in hypervisors just never saw the breakage. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/