Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-11 Thread Ingo Molnar
* Waiman Long wrote: > BTW, I have also been thinking about extracting the spinlock out from the > mutex > structure for some busy mutex by adding a pointer to an external auxiliary > structure (separately allocated at init time). The idea is to use the > external > spinlock if available. O

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Waiman Long
On 04/10/2013 01:16 PM, Ingo Molnar wrote: * Waiman Long wrote: On 04/10/2013 06:31 AM, Ingo Molnar wrote: * Waiman Long wrote: That said, the MUTEX_SHOULD_XCHG_COUNT macro should die. Why shouldn't all architectures just consider negative counts to be locked? It doesn't matter that some

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Ingo Molnar
* Waiman Long wrote: > On 04/10/2013 06:31 AM, Ingo Molnar wrote: > >* Waiman Long wrote: > > > >>>That said, the MUTEX_SHOULD_XCHG_COUNT macro should die. Why shouldn't all > >>>architectures just consider negative counts to be locked? It doesn't matter > >>>that some might only ever see -1. >

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Waiman Long
On 04/10/2013 06:31 AM, Ingo Molnar wrote: * Waiman Long wrote: That said, the MUTEX_SHOULD_XCHG_COUNT macro should die. Why shouldn't all architectures just consider negative counts to be locked? It doesn't matter that some might only ever see -1. I think so too. However, I don't have the ma

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Waiman Long
On 04/10/2013 06:28 AM, Ingo Molnar wrote: * Waiman Long wrote: Furthermore, since you are seeing this effect so profoundly, have you considered using another approach, such as queueing all the poll-waiters in some fashion? That would optimize your workload additionally: removing the 'stamped

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Linus Torvalds
On Wed, Apr 10, 2013 at 7:09 AM, Robin Holt wrote: > On Mon, Apr 08, 2013 at 07:38:39AM -0700, Linus Torvalds wrote: >> >> I forget where we saw the case where we should *not* read the initial >> value, though. Anybody remember? > > I think you might be remembering ia64. Fairly early on, I recall

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Robin Holt
On Mon, Apr 08, 2013 at 07:38:39AM -0700, Linus Torvalds wrote: > On Mon, Apr 8, 2013 at 5:42 AM, Ingo Molnar wrote: > > > > AFAICS the main performance trade-off is the following: when the owner CPU > > unlocks > > the mutex, we'll poll it via a read first, which turns the cacheline into > > sha

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Ingo Molnar
* Waiman Long wrote: > > That said, the MUTEX_SHOULD_XCHG_COUNT macro should die. Why shouldn't all > > architectures just consider negative counts to be locked? It doesn't matter > > that some might only ever see -1. > > I think so too. However, I don't have the machines to test out other >

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-10 Thread Ingo Molnar
* Waiman Long wrote: > > Furthermore, since you are seeing this effect so profoundly, have you > > considered using another approach, such as queueing all the poll-waiters in > > some fashion? > > > > That would optimize your workload additionally: removing the 'stampede' of > > trylock attem

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-08 Thread Waiman Long
On 04/08/2013 10:38 AM, Linus Torvalds wrote: On Mon, Apr 8, 2013 at 5:42 AM, Ingo Molnar wrote: AFAICS the main performance trade-off is the following: when the owner CPU unlocks the mutex, we'll poll it via a read first, which turns the cacheline into shared-read MESI state. Then we notice t

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-08 Thread Waiman Long
On 04/08/2013 08:42 AM, Ingo Molnar wrote: * Waiman Long wrote: In the __mutex_lock_common() function, an initial entry into the lock slow path will cause two atomic_xchg instructions to be issued. Together with the atomic decrement in the fast path, a total of three atomic read-modify-write i

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-08 Thread Ingo Molnar
* Linus Torvalds wrote: > On Mon, Apr 8, 2013 at 5:42 AM, Ingo Molnar wrote: > > > > AFAICS the main performance trade-off is the following: when the owner CPU > > unlocks > > the mutex, we'll poll it via a read first, which turns the cacheline into > > shared-read MESI state. Then we notice t

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-08 Thread Linus Torvalds
On Mon, Apr 8, 2013 at 5:42 AM, Ingo Molnar wrote: > > AFAICS the main performance trade-off is the following: when the owner CPU > unlocks > the mutex, we'll poll it via a read first, which turns the cacheline into > shared-read MESI state. Then we notice that its content signals 'lock is > avai

Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-08 Thread Ingo Molnar
* Waiman Long wrote: > In the __mutex_lock_common() function, an initial entry into > the lock slow path will cause two atomic_xchg instructions to be > issued. Together with the atomic decrement in the fast path, a total > of three atomic read-modify-write instructions will be issued in > rapid

[PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations

2013-04-04 Thread Waiman Long
In the __mutex_lock_common() function, an initial entry into the lock slow path will cause two atomic_xchg instructions to be issued. Together with the atomic decrement in the fast path, a total of three atomic read-modify-write instructions will be issued in rapid succession. This can cause a lot