On Fri, Sep 20, 2013 at 10:45 AM, Will Deacon <will.dea...@arm.com> wrote: > > Right, turns out I can get some interesting numbers from your simple t.c > program on my dual-cluster, 5 CPU ARMv7 machine. The new cmpxchg-based lockref > code gives ~50% improvement, but the fun part is that implementing cmpxchg64 > without memory barriers doubles this win to ~100% over current mainline.
Ok, that's certainly noticeable. > If we can guarantee that the CODE just messes around with the lockref, those > barriers probably aren't needed... Yes. I've been thyinking about the barrier issue, and as far as I can see, as long as the lockref code only ever messes with the reference count, a totally unordered cmpxchg is fine. And at least right now we indeed only ever mess with the reference count. I have been idly toying with the concept of using the cmpxchg also for possibly taking the lock (for the "xyz_or_lock" versions), but every time I look at it it seems unlikely to help, and it would require memory ordering and various architecture-dependent issues, so I suspect it's never going to make much sense. So yes, an unordered cmpxchg64 should be perfectly fine. > As for AIM7/re-aim, I'm having a hard time getting repeatable numbers out of > it to establish a baseline, so it's not proving to be especially helpful. That's fine, and yeah, I doubt the t.c improvement really shows anywhere else (it's kind of extreme), but your numbers are certainly already sufficient to say "ok, it makes sense even on 32-bit machines". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/