On Thu, Mar 31, 2016 at 08:42:55AM -0700, Paul E. McKenney wrote: > On Wed, Mar 30, 2016 at 07:55:47AM -0700, Paul E. McKenney wrote: > > On Tue, Mar 29, 2016 at 06:49:08AM -0700, Paul E. McKenney wrote: > > > On Mon, Mar 28, 2016 at 05:28:14PM -0700, Paul E. McKenney wrote: > > > > On Mon, Mar 28, 2016 at 05:25:18PM -0700, Paul E. McKenney wrote: > > > > > On Mon, Mar 28, 2016 at 06:08:41AM -0700, Paul E. McKenney wrote: > > > > > > On Mon, Mar 28, 2016 at 08:25:47AM +0200, Peter Zijlstra wrote: > > > > > > > On Sun, Mar 27, 2016 at 02:06:41PM -0700, Paul E. McKenney wrote: > > > > > > > > > > [ . . . ] > > > > > > > > > > > > > OK, so I should instrument migration_call() if I get the repro > > > > > > > > rate up? > > > > > > > > > > > > > > Can do, maybe try the below first. (yes I know how long it all > > > > > > > takes :/) > > > > > > > > > > > > OK, will run this today, then run calibration for last night's run > > > > > > this > > > > > > evening. > > > > > > And of 18 two-hour runs, there were five failures, or about 28%. > > > That said, I don't have even one significant digit on the failure rate, > > > as 5 of 18 is within the 95% confidence limits for a failure probability > > > as low as 12.5% and as high as 47%. > > > > And after last night's run, this is narrowed down to between 23% and 38%, > > which is close enough. Average is 30%, 18 failures in 60 runs. > > > > Next step is to test Peter's patch some more. Might take a couple of > > night's worth of runs to get statistical significance. After which > > it will be time to rebase to 4.6-rc1. > > And the first night was not so good: 6 failures out of 24 runs. Adding > this to the 1-of-10 earlier gets 7 failures of 34. Here are how things > stack up given the range of base failure estimates: > > Low 95% bound of 23%: 84% confidence. > > Actual measurement of 30%: 92% confidence. > > High 95% bound of 38%: 98% confidence. > > So there is still some chance that Peter's patch is helping. I will > run for one more evening, after which it will be time to move forward > to 4.6-rc1.
And no luck reducing bounds. However, moving to 4.6-rc1 did get some of the trace_printk() to print. The ftrace_dump()s resulted in RCU CPU stall warnings, and the dumps were truncated due to test timeouts in my scripting. (I need to make my scripts more patient when they see an ftrace dump in progress, I guess.) Here are the results: http://www2.rdrop.com/users/paulmck/submission/TREE03.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.1.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.2.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.3.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.4.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.5.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.6.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.7.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.8.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.9.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.11.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.12.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.13.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.14.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.15.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.16.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.17.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.18.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.19.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.20.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.21.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.22.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.23.console.log.tgz http://www2.rdrop.com/users/paulmck/submission/TREE03.24.console.log.tgz The config is here: http://www2.rdrop.com/users/paulmck/submission/config.tgz More runs to measure 4.6-rc1 base error rate... Thanx, Paul