On Tue, Feb 09, 2016 at 09:11:55PM +1100, Ross Green wrote: > Continued testing with the latest linux-4.5-rc3 release. > > Please find attached a copy of traces from dmesg: > > There is a lot more debug and trace data so hopefully this will shed > some light on what might be happening here. > > My testing remains run a series of simple benchmarks, let that run to > completion and then leave the system idle away with just a few daemons > running. > > the self detected stalls in this instance turned up after a days run time. > There were NO heavy artificial computational loads on the machine.
It does indeed look quiet on that dmesg for a good long time. The following insanely crude not-for-mainline hack -might- be producing good results in my testing. It will take some time before I can claim statistically different results. But please feel free to give it a go in the meantime. (Thanks to Al Viro for pointing me in this direction.) Thanx, Paul ------------------------------------------------------------------------ commit 0c2c8d9fd1641809830a7a75f84dcad69936ef56 Author: Paul E. McKenney <paul...@linux.vnet.ibm.com> Date: Tue Feb 16 15:42:36 2016 -0800 rcu: Crude exploratory hack Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 507d0ed48b97..5928e084620d 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2194,8 +2194,10 @@ static int __noreturn rcu_gp_kthread(void *arg) READ_ONCE(rsp->gpnum), TPS("fqswait")); rsp->gp_state = RCU_GP_WAIT_FQS; - ret = wait_event_interruptible_timeout(rsp->gp_wq, - rcu_gp_fqs_check_wake(rsp, &gf), j); + ret = schedule_timeout_interruptible(j > 0 ? j : 1); + rcu_gp_fqs_check_wake(rsp, &gf); + // ret = wait_event_interruptible_timeout(rsp->gp_wq, + // rcu_gp_fqs_check_wake(rsp, &gf), j); rsp->gp_state = RCU_GP_DOING_FQS; /* Locking provides needed memory barriers. */ /* If grace period done, leave loop. */