[tip:locking/core] locking/mcs: Use smp_cond_load_acquire() in MCS spin loop

2018-04-27 Thread tip-bot for Jason Low
Commit-ID: 7f56b58a92aaf2cab049f32a19af7cc57a3972f2 Gitweb: https://git.kernel.org/tip/7f56b58a92aaf2cab049f32a19af7cc57a3972f2 Author: Jason Low AuthorDate: Thu, 26 Apr 2018 11:34:22 +0100 Committer: Ingo Molnar CommitDate: Fri, 27 Apr 2018 09:48:49 +0200 locking/mcs: Use

Re: [PATCH -v4 2/8] locking/mutex: Rework mutex::owner

2016-10-12 Thread Jason Low
On Wed, 2016-10-12 at 10:59 -0700, Davidlohr Bueso wrote: > On Fri, 07 Oct 2016, Peter Zijlstra wrote: > >+/* > >+ * Optimistic trylock that only works in the uncontended case. Make sure to > >+ * follow with a __mutex_trylock() before failing. > >+ */ > >+static __always_inline bool __mutex_tryloc

Re: [PATCH -v4 0/8] locking/mutex: Rewrite basic mutex

2016-10-11 Thread Jason Low
--- | 100 - 900 | 76,362 JPM | 76,298 JPM | - | 1000 - 1900 | 77,146 JPM | 76,061 JPM | --------- Tested-by: Jason Low

Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-06 Thread Jason Low
On Wed, Oct 5, 2016 at 10:47 PM, Davidlohr Bueso wrote: > On Wed, 05 Oct 2016, Waiman Long wrote: > >> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c >> index 05a3785..1e6823a 100644 >> --- a/kernel/locking/osq_lock.c >> +++ b/kernel/locking/osq_lock.c >> @@ -12,6 +12,23 @@ >>

Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-04 Thread Jason Low
On Tue, Oct 4, 2016 at 12:06 PM, Davidlohr Bueso wrote: > On Thu, 18 Aug 2016, Waiman Long wrote: > >> The osq_lock() and osq_unlock() function may not provide the necessary >> acquire and release barrier in some cases. This patch makes sure >> that the proper barriers are provided when osq_lock()

Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex

2016-08-23 Thread Jason Low
On Tue, 2016-08-23 at 09:35 -0700, Jason Low wrote: > On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote: > > I have not looked at the patches yet, but are there any performance minutia > > to be aware of? > > This would remove all of the mutex architecture specific o

Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex

2016-08-23 Thread Jason Low
On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote: > What's the motivation here? Is it just to unify counter and owner for > the starvation issue? If so, is this really the path we wanna take for > a small debug corner case? And we thought our other patch was a bit invasive :-) > I have n

Re: [PATCH v4] locking/mutex: Prevent lock starvation when spinning is disabled

2016-08-18 Thread Jason Low
On Thu, 2016-08-18 at 17:39 -0700, Jason Low wrote: > Imre reported an issue where threads are getting starved when trying > to acquire a mutex. Threads acquiring a mutex can get arbitrarily delayed > sleeping on a mutex because other threads can continually steal the lock > in the fas

Re: [PATCH v5 3/3] locking/mutex: Ensure forward progress of waiter-spinner

2016-08-18 Thread Jason Low
On Thu, 2016-08-18 at 17:58 +0200, Peter Zijlstra wrote: > On Thu, Aug 11, 2016 at 11:01:27AM -0400, Waiman Long wrote: > > The following is the updated patch that should fix the build error in > > non-x86 platform. > > > > This patch was whitespace challenged, but I think I munged it properly. >

[PATCH v4] locking/mutex: Prevent lock starvation when spinning is disabled

2016-08-18 Thread Jason Low
for too long. Reported-by: Imre Deak Signed-off-by: Jason Low --- include/linux/mutex.h | 2 + kernel/locking/mutex.c | 122 +++-- 2 files changed, 99 insertions(+), 25 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index

Re: [PATCH-queue/locking/core] locking/mutex: Unify yield_to_waiter & waiter_spinning

2016-08-18 Thread Jason Low
ct mutex *lock, >* Turn on the waiter spinning flag to discourage the spinner >* from getting the lock. Might want to update this comment to "Turn on the yield to waiter flag to discourage optimistic spinners from stealing the lock." Besides that: Acked-by: Jason Low

Re: [PATCH v2] locking/mutex: Prevent lock starvation when spinning is enabled

2016-08-17 Thread Jason Low
Hi Wanpeng, On Wed, 2016-08-17 at 09:41 +0800, Wanpeng Li wrote: > 2016-08-11 2:44 GMT+08:00 Jason Low : > > Imre reported an issue where threads are getting starved when trying > > to acquire a mutex. Threads acquiring a mutex can get arbitrarily delayed > > sleeping on

Re: [PATCH v2] locking/mutex: Prevent lock starvation when spinning is enabled

2016-08-16 Thread Jason Low
; > url: > https://github.com/0day-ci/linux/commits/Jason-Low/locking-mutex-Prevent-lock-starvation-when-spinning-is-enabled/20160811-034327 > config: x86_64-randconfig-x013-201632 (attached as .config) > compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 > reproduce: >

Re: [PATCH v2] locking/mutex: Prevent lock starvation when spinning is enabled

2016-08-16 Thread Jason Low
On Thu, 2016-08-11 at 11:40 -0400, Waiman Long wrote: > On 08/10/2016 02:44 PM, Jason Low wrote: > > +static inline void do_yield_to_waiter(struct mutex *lock, int *wakeups) > > +{ > > + return; > > +} > > + > > +static inline void clear_yield_to_waiter(st

Re: [PATCH v2] locking/mutex: Prevent lock starvation when spinning is enabled

2016-08-10 Thread Jason Low
On Wed, 2016-08-10 at 11:44 -0700, Jason Low wrote: > @@ -917,11 +976,12 @@ EXPORT_SYMBOL(mutex_trylock); > int __sched > __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) > { > - int ret; > + int ret = 1; > > might_

Re: [PATCH v2] locking/mutex: Prevent lock starvation when spinning is enabled

2016-08-10 Thread Jason Low
On Wed, 2016-08-10 at 11:44 -0700, Jason Low wrote: > Imre reported an issue where threads are getting starved when trying > to acquire a mutex. Threads acquiring a mutex can get arbitrarily delayed > sleeping on a mutex because other threads can continually steal the lock > in the fas

[PATCH v2] locking/mutex: Prevent lock starvation when spinning is enabled

2016-08-10 Thread Jason Low
for too long. Reported-by: Imre Deak Signed-off-by: Jason Low --- v1->v2: - Addressed Waiman's suggestions of needing the yield_to_waiter flag only in the CONFIG_SMP case. - Make sure to only clear the flag if the thread is the top waiter. - Refactor code to clear flag into an inline fun

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-22 Thread Jason Low
On Fri, 2016-07-22 at 12:34 +0300, Imre Deak wrote: > On to, 2016-07-21 at 15:29 -0700, Jason Low wrote: > > On Wed, 2016-07-20 at 14:37 -0400, Waiman Long wrote: > > > On 07/20/2016 12:39 AM, Jason Low wrote: > > > > On Tue, 2016-07-19 at 16:04 -0700, Jaso

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-21 Thread Jason Low
On Wed, 2016-07-20 at 14:37 -0400, Waiman Long wrote: > On 07/20/2016 12:39 AM, Jason Low wrote: > > On Tue, 2016-07-19 at 16:04 -0700, Jason Low wrote: > >> Hi Imre, > >> > >> Here is a patch which prevents a thread from spending too much &q

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-21 Thread Jason Low
On Wed, 2016-07-20 at 16:29 +0300, Imre Deak wrote: > On ti, 2016-07-19 at 21:39 -0700, Jason Low wrote: > > On Tue, 2016-07-19 at 16:04 -0700, Jason Low wrote: > > > Hi Imre, > > > > > > Here is a patch which prevents a thread from spending too much &qu

Re: [RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-19 Thread Jason Low
On Tue, 2016-07-19 at 16:04 -0700, Jason Low wrote: > Hi Imre, > > Here is a patch which prevents a thread from spending too much "time" > waiting for a mutex in the !CONFIG_MUTEX_SPIN_ON_OWNER case. > > Would you like to try this out and see if this addresses the

[RFC] Avoid mutex starvation when optimistic spinning is disabled

2016-07-19 Thread Jason Low
s disabled? Thanks. --- Signed-off-by: Jason Low --- include/linux/mutex.h | 2 ++ kernel/locking/mutex.c | 61 +- 2 files changed, 58 insertions(+), 5 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 2cb7531..c1ca68d 10

Re: [RFC] locking/mutex: Fix starvation of sleeping waiters

2016-07-19 Thread Jason Low
On Tue, 2016-07-19 at 19:53 +0300, Imre Deak wrote: > On ma, 2016-07-18 at 10:47 -0700, Jason Low wrote: > > On Mon, 2016-07-18 at 19:15 +0200, Peter Zijlstra wrote: > > > I think we went over this before, that will also completely destroy > > > performance

Re: [PATCH v3 0/3] locking/mutex: Enable optimistic spinning of lock waiter

2016-07-18 Thread Jason Low
ceable impact on system > performance. > > This patchset tries to address 2 issues with Peter's patch: > > 1) Ding Tianhong still find that hanging task could happen in some cases. > 2) Jason Low found that there was performance regression for some AIM7 > workloads.

Re: [RFC] locking/mutex: Fix starvation of sleeping waiters

2016-07-18 Thread Jason Low
On Mon, 2016-07-18 at 19:15 +0200, Peter Zijlstra wrote: > On Mon, Jul 18, 2016 at 07:16:47PM +0300, Imre Deak wrote: > > Currently a thread sleeping on a mutex wait queue can be delayed > > indefinitely by other threads managing to steal the lock, that is > > acquiring the lock out-of-order before

[tip:locking/core] locking/rwsem: Convert sem->count to 'atomic_long_t'

2016-06-08 Thread tip-bot for Jason Low
Commit-ID: 8ee62b1870be8e630158701632a533d0378e15b8 Gitweb: http://git.kernel.org/tip/8ee62b1870be8e630158701632a533d0378e15b8 Author: Jason Low AuthorDate: Fri, 3 Jun 2016 22:26:02 -0700 Committer: Ingo Molnar CommitDate: Wed, 8 Jun 2016 15:16:42 +0200 locking/rwsem: Convert sem

[PATCH v2 1/2] locking/rwsem: Convert sem->count to atomic_long_t

2016-06-03 Thread Jason Low
add,update} definitions across the various architectures. Suggested-by: Peter Zijlstra Signed-off-by: Jason Low --- arch/alpha/include/asm/rwsem.h | 26 +- arch/ia64/include/asm/rwsem.h | 24 include/asm-generic/rwsem.h| 6 +++--- include/lin

[PATCH v2 2/2] Remove rwsem_atomic_add() and rwsem_atomic_update()

2016-06-03 Thread Jason Low
The rwsem-xadd count has been converted to an atomic variable and the rwsem code now directly uses atomic_long_add() and atomic_long_add_return(), so we can remove the arch implementations of rwsem_atomic_add() and rwsem_atomic_update(). Signed-off-by: Jason Low --- arch/alpha/include/asm

[PATCH v2 0/2] locking/rwsem: Convert rwsem count to atomic_long_t

2016-06-03 Thread Jason Low
to an atomic_long_t since it is used it as an atomic variable. This allows us to also remove the rwsem_atomic_{add,update} abstraction and reduce 100+ lines of code. Jason Low (2): locking/rwsem: Convert sem->count to atomic_long_t Remove rwsem_atomic_add() and rwsem_atomic_update() arch/alph

Re: [RFC][PATCH 0/7] locking/rwsem: Convert rwsem count to atomic_long_t

2016-06-03 Thread Jason Low
On Sat, 2016-06-04 at 00:36 +0200, Peter Zijlstra wrote: > On Fri, Jun 03, 2016 at 11:09:54AM -0700, Jason Low wrote: > > --- a/arch/alpha/include/asm/rwsem.h > > +++ b/arch/alpha/include/asm/rwsem.h > > @@ -25,8 +25,8 @@ static inline void __down_read(struct rw_semaphore *sem

Re: [RFC][PATCH 0/7] locking/rwsem: Convert rwsem count to atomic_long_t

2016-06-03 Thread Jason Low
On Fri, 2016-06-03 at 10:04 +0200, Ingo Molnar wrote: > * Peter Zijlstra wrote: > > > On Mon, May 16, 2016 at 06:12:25PM -0700, Linus Torvalds wrote: > > > On Mon, May 16, 2016 at 5:37 PM, Jason Low wrote: > > > > > > > > This rest of the s

[tip:locking/core] locking/mutex: Set and clear owner using WRITE_ONCE()

2016-06-03 Thread tip-bot for Jason Low
Commit-ID: 6e2814745c67ab422b86262b05e6f23a56f28aa3 Gitweb: http://git.kernel.org/tip/6e2814745c67ab422b86262b05e6f23a56f28aa3 Author: Jason Low AuthorDate: Fri, 20 May 2016 15:19:36 -0700 Committer: Ingo Molnar CommitDate: Fri, 3 Jun 2016 12:06:10 +0200 locking/mutex: Set and clear

[tip:locking/core] locking/rwsem: Optimize write lock by reducing operations in slowpath

2016-06-03 Thread tip-bot for Jason Low
Commit-ID: c0fcb6c2d332041256dc55d8a1ec3c0a2d0befb8 Gitweb: http://git.kernel.org/tip/c0fcb6c2d332041256dc55d8a1ec3c0a2d0befb8 Author: Jason Low AuthorDate: Mon, 16 May 2016 17:38:00 -0700 Committer: Ingo Molnar CommitDate: Fri, 3 Jun 2016 09:47:13 +0200 locking/rwsem: Optimize write

[PATCH v3] locking/mutex: Set and clear owner using WRITE_ONCE()

2016-05-24 Thread Jason Low
use a partially written owner value. This is not necessary in the debug case where the owner gets modified with the wait_lock held. Signed-off-by: Jason Low Acked-by: Davidlohr Bueso Acked-by: Waiman Long --- kernel/locking/mutex-debug.h | 5 + kernel/locking/mutex.h | 10

Re: [PATCH] locking/mutex: Set and clear owner using WRITE_ONCE()

2016-05-23 Thread Jason Low
On Mon, 2016-05-23 at 14:31 -0700, Davidlohr Bueso wrote: > On Mon, 23 May 2016, Jason Low wrote: > > >On Fri, 2016-05-20 at 18:00 -0700, Davidlohr Bueso wrote: > >> On Fri, 20 May 2016, Waiman Long wrote: > >> > >> >I think mutex-debug.h a

Re: [PATCH] locking/mutex: Set and clear owner using WRITE_ONCE()

2016-05-23 Thread Jason Low
On Fri, 2016-05-20 at 18:00 -0700, Davidlohr Bueso wrote: > On Fri, 20 May 2016, Waiman Long wrote: > > >I think mutex-debug.h also needs similar changes for completeness. > > Maybe, but given that with debug the wait_lock is unavoidable, doesn't > this send the wrong message? The mutex_set_owne

Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE

2016-05-23 Thread Jason Low
On Sat, 2016-05-21 at 09:04 -0700, Peter Hurley wrote: > On 05/18/2016 12:58 PM, Jason Low wrote: > > It should be fine to use the standard READ_ONCE here, even if it's just > > for documentation, as it's probably not going to cost anything in > > practice. It woul

[PATCH v2] locking/mutex: Set and clear owner using WRITE_ONCE()

2016-05-20 Thread Jason Low
use a partially written owner value. Signed-off-by: Jason Low Acked-by: Davidlohr Bueso --- kernel/locking/mutex-debug.h | 4 ++-- kernel/locking/mutex.h | 10 -- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/kernel/locking/mutex-debug.h b/kernel/locking/mutex

Re: [PATCH] locking/mutex: Set and clear owner using WRITE_ONCE()

2016-05-20 Thread Jason Low
On Fri, 2016-05-20 at 16:27 -0400, Waiman Long wrote: > On 05/19/2016 06:23 PM, Jason Low wrote: > > The mutex owner can get read and written to without the wait_lock. > > Use WRITE_ONCE when setting and clearing the owner field in order > > to avoid optimizations such a

[PATCH] locking/mutex: Set and clear owner using WRITE_ONCE()

2016-05-19 Thread Jason Low
read and use a partially written owner value. Signed-off-by: Jason Low --- kernel/locking/mutex.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h index 5cda397..469b61e 100644 --- a/kernel/locking/mutex.h +++ b/kernel

Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE

2016-05-19 Thread Jason Low
On Wed, 2016-05-18 at 12:58 -0700, Jason Low wrote: > On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote: > > On 05/18/2016 01:21 PM, Jason Low wrote: > > > On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote: > > >> On Tue, 17 May 2016, Waiman Long wrote

Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE

2016-05-18 Thread Jason Low
On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote: > On 05/18/2016 01:21 PM, Jason Low wrote: > > On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote: > >> On Tue, 17 May 2016, Waiman Long wrote: > >> > >>> Without using WRITE_ONCE(), the compiler c

Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE()

2016-05-18 Thread Jason Low
D_ONCE() may > >not be needed for rwsem->owner as long as the value is only used for > >comparison and not dereferencing. > > > >Signed-off-by: Waiman Long > > Yes, ->owner can obviously be handled locklessly during optimistic > spinning. > > Acked-by: Davidlohr Bueso Acked-by: Jason Low

Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE

2016-05-18 Thread Jason Low
On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote: > On Tue, 17 May 2016, Waiman Long wrote: > > >Without using WRITE_ONCE(), the compiler can potentially break a > >write into multiple smaller ones (store tearing). So a read from the > >same data by another task concurrently may return a p

Re: [RFC][PATCH 0/7] locking/rwsem: Convert rwsem count to atomic_long_t

2016-05-17 Thread Jason Low
On Tue, 2016-05-17 at 13:09 +0200, Peter Zijlstra wrote: > On Mon, May 16, 2016 at 06:12:25PM -0700, Linus Torvalds wrote: > > On Mon, May 16, 2016 at 5:37 PM, Jason Low wrote: > > > > > > This rest of the series converts the rwsem count variable to an > > >

[RFC][PATCH 7/7] locking,asm-generic: Remove generic rwsem add and rwsem update definitions

2016-05-16 Thread Jason Low
The rwsem count has been converted to an atomic variable and we now directly use atomic_long_add() and atomic_long_add_return() on the count, so we can remove the asm-generic implementation of rwsem_atomic_add() and rwsem_atomic_update(). Signed-off-by: Jason Low --- include/asm-generic/rwsem.h

[RFC][PATCH 6/7] locking,s390: Remove s390 rwsem add and rwsem update

2016-05-16 Thread Jason Low
The rwsem count has been converted to an atomic variable and the rwsem code now directly uses atomic_long_add() and atomic_long_add_return(), so we can remove the s390 implementation of rwsem_atomic_add() and rwsem_atomic_update(). Signed-off-by: Jason Low --- arch/s390/include/asm/rwsem.h | 37

[RFC][PATCH 3/7] locking,x86: Remove x86 rwsem add and rwsem update

2016-05-16 Thread Jason Low
The rwsem count has been converted to an atomic variable and the rwsem code now directly uses atomic_long_add() and atomic_long_add_return(), so we can remove the x86 implementation of rwsem_atomic_add() and rwsem_atomic_update(). Signed-off-by: Jason Low --- arch/x86/include/asm/rwsem.h | 18

[RFC][PATCH 4/7] locking,alpha: Remove Alpha rwsem add and rwsem update

2016-05-16 Thread Jason Low
The rwsem count has been converted to an atomic variable and the rwsem code now directly uses atomic_long_add() and atomic_long_add_return(), so we can remove the alpha implementation of rwsem_atomic_add() and rwsem_atomic_update(). Signed-off-by: Jason Low --- arch/alpha/include/asm/rwsem.h

[RFC][PATCH 5/7] locking,ia64: Remove ia64 rwsem add and rwsem update

2016-05-16 Thread Jason Low
The rwsem count has been converted to an atomic variable and the rwsem code now directly uses atomic_long_add() and atomic_long_add_return(), so we can remove the ia64 implementation of rwsem_atomic_add() and rwsem_atomic_update(). Signed-off-by: Jason Low --- arch/ia64/include/asm/rwsem.h | 7

[RFC][PATCH 2/7] locking/rwsem: Convert sem->count to atomic_long_t

2016-05-16 Thread Jason Low
add,update} definitions across the various architectures. Suggested-by: Peter Zijlstra Signed-off-by: Jason Low --- include/linux/rwsem.h | 6 +++--- kernel/locking/rwsem-xadd.c | 31 --- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/include/linux

[RFC][PATCH 1/7] locking/rwsem: Optimize write lock by reducing operations in slowpath

2016-05-16 Thread Jason Low
operations. We can instead make the list_is_singular() check first, and then set the count accordingly, so that we issue at most 1 atomic operation when acquiring the write lock and reduce unnecessary cacheline contention. Signed-off-by: Jason Low Acked-by: Waiman Long Acked-by: Davidlohr Bueso

[RFC][PATCH 0/7] locking/rwsem: Convert rwsem count to atomic_long_t

2016-05-16 Thread Jason Low
The first patch contains an optimization for acquiring the rwsem write lock in the slowpath. This rest of the series converts the rwsem count variable to an atomic_long_t since it is used it as an atomic variable. This allows us to also remove the rwsem_atomic_{add,update} abstraction and reduce 1

[PATCH v2] locking/rwsem: Optimize write lock by reducing operations in slowpath

2016-05-16 Thread Jason Low
operations. We can instead make the list_is_singular() check first, and then set the count accordingly, so that we issue at most 1 atomic operation when acquiring the write lock and reduce unnecessary cacheline contention. Signed-off-by: Jason Low Acked-by: Waiman Long Acked-by: Davidlohr Bueso

Re: [PATCH] locking/rwsem: Optimize write lock slowpath

2016-05-11 Thread Jason Low
On Wed, 2016-05-11 at 11:33 -0700, Davidlohr Bueso wrote: > On Wed, 11 May 2016, Peter Zijlstra wrote: > > >On Mon, May 09, 2016 at 12:16:37PM -0700, Jason Low wrote: > >> When acquiring the rwsem write lock in the slowpath, we first try > >> to set count to RW

Re: [PATCH] locking/rwsem: Optimize write lock slowpath

2016-05-11 Thread Jason Low
On Wed, 2016-05-11 at 13:49 +0200, Peter Zijlstra wrote: > On Mon, May 09, 2016 at 12:16:37PM -0700, Jason Low wrote: > > When acquiring the rwsem write lock in the slowpath, we first try > > to set count to RWSEM_WAITING_BIAS. When that is successful, > > we th

[PATCH] locking/rwsem: Optimize write lock slowpath

2016-05-09 Thread Jason Low
operations. We can instead make the list_is_singular() check first, and then set the count accordingly, so that we issue at most 1 atomic operation when acquiring the write lock and reduce unnecessary cacheline contention. Signed-off-by: Jason Low --- kernel/locking/rwsem-xadd.c | 20

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-09 Thread Jason Low
> >19.95% 5.88% fio [kernel.vmlinux] [k] rwsem_down_write_failed >14.20% 1.52% fio [kernel.vmlinux] [k] rwsem_down_write_failed > > The actual CPU cycles spend in rwsem_down_write_failed() dropped from > 5.88% to 1.52% after the patch. > > The xfstests was also run and no regression was observed. > > Signed-off-by: Waiman Long Acked-by: Jason Low

Re: [PATCH] locking/rwsem: Add reader owned state to the owner field

2016-05-04 Thread Jason Low
On Wed, 2016-05-04 at 13:27 -0400, Waiman Long wrote: > On 05/03/2016 08:21 PM, Davidlohr Bueso wrote: > > On Wed, 27 Apr 2016, Waiman Long wrote: > >> static bool rwsem_optimistic_spin(struct rw_semaphore *sem) > >> @@ -378,7 +367,8 @@ static bool rwsem_optimistic_spin(struct > >> rw_semaphore *s

[PATCH v3] MCS spinlock: Use smp_cond_load_acquire() in spin loop

2016-04-20 Thread Jason Low
that it uses the new smp_cond_load_acquire() so that ARM64 can also override this spin loop with its own implementation using WFE. On x86, this can also be cheaper than spinning on smp_load_acquire(). Signed-off-by: Jason Low --- v2 -> v3: - Add additional comments about the use

Re: [RFC] arm64: Implement WFE based spin wait for MCS spinlocks

2016-04-20 Thread Jason Low
On Wed, 2016-04-20 at 12:30 +0200, Peter Zijlstra wrote: > On Thu, Apr 14, 2016 at 12:13:38AM -0700, Jason Low wrote: > > Use WFE to avoid most spinning with MCS spinlocks. This is implemented > > with the new cmpwait() mechanism for comparing and waiting for the MCS > > l

[RFC] arm64: Implement WFE based spin wait for MCS spinlocks

2016-04-14 Thread Jason Low
Use WFE to avoid most spinning with MCS spinlocks. This is implemented with the new cmpwait() mechanism for comparing and waiting for the MCS locked value to change using LDXR + WFE. Signed-off-by: Jason Low --- arch/arm64/include/asm/mcs_spinlock.h | 21 + 1 file changed

Re: [PATCH v2] MCS spinlock: Use smp_cond_load_acquire()

2016-04-13 Thread Jason Low
On Wed, 2016-04-13 at 10:43 -0700, Will Deacon wrote: > On Tue, Apr 12, 2016 at 08:02:17PM -0700, Jason Low wrote: > > For qspinlocks on ARM64, we would like to use WFE instead > > of purely spinning. Qspinlocks internally have lock > > contenders spin on an MCS

[PATCH v2] MCS spinlock: Use smp_cond_load_acquire()

2016-04-12 Thread Jason Low
implementation using WFE. On x86, it can also cheaper to use this than spinning on smp_load_acquire(). Signed-off-by: Jason Low --- Changes from v1: - Pass l instead of &l to smp_cond_load_acquire() since l is already a pointer to the lock variable. kernel/locking/mcs_spinlock.h | 8 1

Re: [PATCH] MCS spinlock: Use smp_cond_load_acquire()

2016-04-12 Thread Jason Low
On Tue, 2016-04-12 at 16:40 -0700, Jason Low wrote: > On Wed, 2016-04-13 at 06:39 +0800, kbuild test robot wrote: > > Hi Jason, > > > > [auto build test ERROR on v4.6-rc3] > > [also build test ERROR on next-20160412] > > [cannot apply to tip/core/locking] >

Re: [PATCH] MCS spinlock: Use smp_cond_load_acquire()

2016-04-12 Thread Jason Low
improving the system] > > url: > https://github.com/0day-ci/linux/commits/Jason-Low/MCS-spinlock-Use-smp_cond_load_acquire/20160413-053726 > config: i386-randconfig-s0-201615 (attached as .config) > reproduce: > # save the attached .config to linux build tree &

[PATCH] MCS spinlock: Use smp_cond_load_acquire()

2016-04-12 Thread Jason Low
es the new smp_cond_load_acquire() so that ARM64 can also override this spin loop with its own implementation using WFE. On x86, it can also cheaper to use this than spinning on smp_load_acquire(). Signed-off-by: Jason Low --- kernel/locking/mcs_spinlock.h | 8 1 file changed, 4 insert

Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()

2016-02-15 Thread Jason Low
On Mon, 2016-02-15 at 18:55 -0500, Waiman Long wrote: > On 02/12/2016 03:40 PM, Peter Zijlstra wrote: > > On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote: > >> @@ -358,8 +373,8 @@ static bool mutex_optimistic_spin(struct mutex *lock, > >>} > >> > >>

Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()

2016-02-15 Thread Jason Low
On Mon, 2016-02-15 at 18:15 -0800, Jason Low wrote: > On Fri, 2016-02-12 at 14:14 -0800, Davidlohr Bueso wrote: > > On Fri, 12 Feb 2016, Peter Zijlstra wrote: > > > > >On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote: > > >> static bool mute

Re: [PATCH v2 1/4] locking/mutex: Add waiter parameter to mutex_optimistic_spin()

2016-02-15 Thread Jason Low
On Fri, 2016-02-12 at 14:14 -0800, Davidlohr Bueso wrote: > On Fri, 12 Feb 2016, Peter Zijlstra wrote: > > >On Fri, Feb 12, 2016 at 12:32:12PM -0500, Waiman Long wrote: > >> static bool mutex_optimistic_spin(struct mutex *lock, > >> +struct ww_acquire_ctx *ww_ctx, > >>

Re: [PATCH 0/2] locking/mutex: Enable optimistic spinning of lock waiter

2016-02-09 Thread Jason Low
7;s patch: > > 1) Ding Tianhong still find that hanging task could happen in some cases. > 2) Jason Low found that there was performance regression for some AIM7 > workloads. This might help address the hang that Ding reported. Performance wise, this patchset reduced AI

Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation

2016-02-04 Thread Jason Low
On Thu, 2016-02-04 at 16:55 +0800, huang ying wrote: > Hi, Low, > > On Thu, Feb 4, 2016 at 9:35 AM, Jason Low wrote: > > I've done some testing with this patch with some of the AIM7 workloads > > and found that this reduced throughput by about 10%. The reduction in

Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation

2016-02-03 Thread Jason Low
are: > > - waiters are on the wait list and need to be taken off > - mutex_optimistic_spin() sets the lock->count to 0 on acquire >even though there might be more tasks on the wait list. > > Cc: Jason Low > Cc: Ingo Molnar > Cc: Tim Chen > Cc: Linus Torvalds

Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Jason Low
On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote: > On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: > > This patch attempts to fix this live-lock condition by enabling the > > a woken task in the wait list to enter optimistic spinning loop itself > > with precedence over the one

Re: [PATCH] posix-cpu-timers: Merge running and checking_timer state in one field

2015-10-20 Thread Jason Low
On Tue, 2015-10-20 at 02:18 +0200, Frederic Weisbecker wrote: > This way we might consume less space in the signal struct (well, > depending on bool size or padding) and we don't need to worry about > ordering between the running and checking_timers fields. This looks fine to me. I ended up going

Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-16 Thread Jason Low
On Fri, 2015-10-16 at 09:12 +0200, Ingo Molnar wrote: > * Jason Low wrote: > > > > > With this patch set (along with commit 1018016c706f mentioned above), > > > > the performance hit of itimers almost completely goes away on the > > > > 16 so

Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Jason Low
On Thu, 2015-10-15 at 10:47 +0200, Ingo Molnar wrote: > * Jason Low wrote: > > > While running a database workload on a 16 socket machine, there were > > scalability issues related to itimers. The following link contains a > > more detailed summary of the issues

Re: [PATCH v2 0/4] timer: Improve itimers scalability

2015-10-15 Thread Jason Low
On Wed, 2015-10-14 at 17:18 -0400, George Spelvin wrote: > I'm going to give 4/4 a closer look to see if the races with timer > expiration make more sense to me than last time around. > (E.g. do CPU time signals even work in CONFIG_NO_HZ_FULL?) > > But although I haven't yet convinced myself the c

[tip:timers/core] posix_cpu_timer: Reduce unnecessary sighand lock contention

2015-10-15 Thread tip-bot for Jason Low
Commit-ID: c8d75aa47dd585c9538a8205e9bb9847e12cfb84 Gitweb: http://git.kernel.org/tip/c8d75aa47dd585c9538a8205e9bb9847e12cfb84 Author: Jason Low AuthorDate: Wed, 14 Oct 2015 12:07:56 -0700 Committer: Thomas Gleixner CommitDate: Thu, 15 Oct 2015 11:23:41 +0200 posix_cpu_timer: Reduce

[tip:timers/core] posix_cpu_timer: Convert cputimer-> running to bool

2015-10-15 Thread tip-bot for Jason Low
Commit-ID: d5c373eb5610686162ff50429f63f4c00c554799 Gitweb: http://git.kernel.org/tip/d5c373eb5610686162ff50429f63f4c00c554799 Author: Jason Low AuthorDate: Wed, 14 Oct 2015 12:07:55 -0700 Committer: Thomas Gleixner CommitDate: Thu, 15 Oct 2015 11:23:41 +0200 posix_cpu_timer: Convert

[tip:timers/core] posix_cpu_timer: Check thread timers only when there are active thread timers

2015-10-15 Thread tip-bot for Jason Low
Commit-ID: 934715a191e4be0c602d39455a7a74316f274d60 Gitweb: http://git.kernel.org/tip/934715a191e4be0c602d39455a7a74316f274d60 Author: Jason Low AuthorDate: Wed, 14 Oct 2015 12:07:54 -0700 Committer: Thomas Gleixner CommitDate: Thu, 15 Oct 2015 11:23:41 +0200 posix_cpu_timer: Check

[tip:timers/core] posix_cpu_timer: Optimize fastpath_timer_check( )

2015-10-15 Thread tip-bot for Jason Low
Commit-ID: 7c177d994eb9637302b79e80d331f48dfbe26368 Gitweb: http://git.kernel.org/tip/7c177d994eb9637302b79e80d331f48dfbe26368 Author: Jason Low AuthorDate: Wed, 14 Oct 2015 12:07:53 -0700 Committer: Thomas Gleixner CommitDate: Thu, 15 Oct 2015 11:23:41 +0200 posix_cpu_timer: Optimize

[PATCH v2 4/4] timer: Reduce unnecessary sighand lock contention

2015-10-14 Thread Jason Low
the thread_group_cputimer structure maintain a boolean to signify when a thread in the group is already checking for process wide timers, and adds extra logic in the fastpath to check the boolean. Signed-off-by: Jason Low Reviewed-by: Oleg Nesterov --- include/linux/init_task.h |1

[PATCH v2 3/4] timer: Convert cputimer->running to bool

2015-10-14 Thread Jason Low
oleans. This is a preparatory patch to convert the existing running integer field to a boolean. Suggested-by: George Spelvin Signed-off-by: Jason Low --- include/linux/init_task.h |2 +- include/linux/sched.h |6 +++--- kernel/fork.c |2 +- kernel/time/pos

[PATCH v2 2/4] timer: Check thread timers only when there are active thread timers

2015-10-14 Thread Jason Low
there are no per-thread timers. As suggested by George, we can put the task_cputime_zero() check in check_thread_timers(), since that is more of an optization to the function. Similarly, we move the existing check of cputimer->running to check_process_timers(). Signed-off-by: Jason Low Revie

[PATCH v2 0/4] timer: Improve itimers scalability

2015-10-14 Thread Jason Low
throughput by more than 30%. With this patch set (along with commit 1018016c706f mentioned above), the performance hit of itimers almost completely goes away on the 16 socket system. Jason Low (4): timer: Optimize fastpath_timer_check() timer: Check thread timers only when there are active thread

[PATCH v2 1/4] timer: Optimize fastpath_timer_check()

2015-10-14 Thread Jason Low
timers set. Signed-off-by: Jason Low Reviewed-by: Oleg Nesterov Reviewed-by: Frederic Weisbecker Reviewed-by: Davidlohr Bueso --- kernel/time/posix-cpu-timers.c | 11 +++ 1 files changed, 3 insertions(+), 8 deletions(-) diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix

Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()

2015-08-31 Thread Jason Low
On Mon, 2015-08-31 at 08:15 -0700, Davidlohr Bueso wrote: > On Tue, 2015-08-25 at 20:17 -0700, Jason Low wrote: > > In fastpath_timer_check(), the task_cputime() function is always > > called to compute the utime and stime values. However, this is not > > necessary if th

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-27 Thread Jason Low
On Thu, 2015-08-27 at 18:43 -0400, George Spelvin wrote: > Jason Low wrote: > > Frederic suggested that we just use a single "status" variable and > > access the bits for the running and checking field. I am leaning towards > > that method, so I might not include the

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-27 Thread Jason Low
On Wed, 2015-08-26 at 21:28 -0400, George Spelvin wrote: > > I can include your patch in the series and then use boolean for the new > > checking_timer field. However, it looks like this applies on an old > > kernel. For example, the spin_lock field has already been removed from > > the structure.

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-27 Thread Jason Low
On Thu, 2015-08-27 at 14:53 +0200, Frederic Weisbecker wrote: > On Wed, Aug 26, 2015 at 04:32:34PM -0700, Jason Low wrote: > > On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote: > > > On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote: > > > >

Re: [PATCH v2] sched: fix nohz.next_balance update

2015-08-27 Thread Jason Low
> > > > nohz_idle_balance must set the nohz.next_balance without taking into > > account this_rq->next_balance which is not updated yet. Then, this_rq will > > update nohz.next_update with its next_balance once updated and if necessary. > > > > Signed-off-by:

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-26 Thread Jason Low
On Wed, 2015-08-26 at 16:32 -0700, Jason Low wrote: > Perhaps to be safer, we use something like load_acquire() and > store_release() for accessing both the ->running and ->checking_timer > fields? Regarding using barriers, one option could be to pair them between sig->cputi

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-26 Thread Jason Low
On Wed, 2015-08-26 at 15:33 -0400, George Spelvin wrote: > And some more comments on the series... > > > @@ -626,6 +628,7 @@ struct task_cputime_atomic { > > struct thread_group_cputimer { > > struct task_cputime_atomic cputime_atomic; > > int running; > >+int checking_timer; > > }; >

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-26 Thread Jason Low
On Thu, 2015-08-27 at 00:56 +0200, Frederic Weisbecker wrote: > On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote: > > It was found while running a database workload on large systems that > > significant time was spent trying to acquire the sighand lock. > > &

Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

2015-08-26 Thread Jason Low
On Thu, 2015-08-27 at 00:31 +0200, Frederic Weisbecker wrote: > On Wed, Aug 26, 2015 at 10:53:35AM -0700, Linus Torvalds wrote: > > On Tue, Aug 25, 2015 at 8:17 PM, Jason Low wrote: > > > > > > This patch addresses this by having the thread_group_cputimer structure

Re: [PATCH 0/3] timer: Improve itimers scalability

2015-08-26 Thread Jason Low
On Wed, 2015-08-26 at 19:08 +0200, Oleg Nesterov wrote: > On 08/26, Jason Low wrote: > > > > Hi Andrew, > > > > On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote: > > > On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low wrote: > > > > > > >

Re: [PATCH 2/3] timer: Check thread timers only when there are active thread timers

2015-08-26 Thread Jason Low
On Wed, 2015-08-26 at 13:04 -0400, George Spelvin wrote: > - check_thread_timers(tsk, &firing); > + if (!task_cputime_zero(&tsk->cputime_expires)) > + check_thread_timers(tsk, &firing); > > Sincere question; I'm not certain myself: would it make more sense to put > this shortcu

Re: [PATCH 1/3] timer: Optimize fastpath_timer_check()

2015-08-26 Thread Jason Low
On Wed, 2015-08-26 at 12:57 -0400, George Spelvin wrote: > > if (!task_cputime_zero(&tsk->cputime_expires)) { > >+struct task_cputime task_sample; > >+cputime_t utime, stime; > >+ > >+task_cputime(tsk, &utime, &stime); > >+task_sample.utime = utim

Re: [PATCH 0/3] timer: Improve itimers scalability

2015-08-26 Thread Jason Low
Hi Andrew, On Tue, 2015-08-25 at 20:27 -0700, Andrew Morton wrote: > On Tue, 25 Aug 2015 20:17:45 -0700 Jason Low wrote: > > > When running a database workload on a 16 socket machine, there were > > scalability issues related to itimers. > > > > Commit 1018016c706

  1   2   3   4   5   >