On Sun, 23 Nov 2014, Chris Mason wrote: > On Sun, Nov 23, 2014 at 4:05 PM, Thomas Gleixner <t...@linutronix.de> wrote: > > On Sun, 23 Nov 2014, Chris Mason wrote: > > > On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <b...@alien8.de> wrote: > > > > On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote: > > > > > It must be: > > > > > > > > > > commit 6e998916dfe327e785e7c2447959b2c1a3ea4930 > > > > > Author: Stanislaw Gruszka <sgrus...@redhat.com> > > > > > Date: Wed Nov 12 16:58:44 2014 +0100 > > > > > > > > > > sched/cputime: Fix clock_nanosleep()/clock_gettime() > > > inconsistency > > > > > > > > > > I'll do two runs to confirm, but it's the only related patch between > > > rc5 > > > > > and > > > > > now. > > > > > > I've adding Ingo and Stanislaw to the cc. With > > > 6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer > > > crashing. > > > > > > Repeating the stack trace for the new cc list. I see the crash with atop > > > or > > > similar walkers of /proc racing against exiting programs. Given the NULL > > > rip, > > > this line from the patch is probably broken, but it really feels like we > > > should be falling over on p->sched_class and not on the update_curr func. > > > > > > + p->sched_class->update_curr(rq); > > > > > > I'm leaving my fork bomb running on two machines with the patch reverted > > > to > > > make sure. > > > > The sched_class instances which do not have update_curr are stop_task > > and idle. Patch below. > > > > I'm sure nobody thought about the stats read code path here. > > > > [ 1053.759741] [<ffffffff81208348>] do_task_stat+0x8b8/0xb00 > > > > do_task_stat(() > > thread_group_cputime_adjusted() > > thread_group_cputime() > > task_cputime() > > task_sched_runtime() > > if (task_current(rq, p) && task_on_rq_queued(p)) { > > update_rq_clock(rq); > > p->sched_class->update_curr(rq); > > } > > > > Now if the stats are read for a stomp machine task, aka 'migration/N' > > and that task is current on its cpu. Ooops. > > > > I added the callback for idle tasks as well for completeness sake. > > This does make sense, but it doesn't match with the crash being much more > likely during the fork bomb. The difference is crashing within a few hours vs > crashing within 5 minutes.
The fork bomb will kick the migration task pretty often into life, so the probablity of do_task_stat() to hit a running migration thread is higher than on a normaly loaded machine. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/