On Sun, Nov 23, 2014 at 4:11 PM, Chris Mason <c...@fb.com> wrote:
On Sun, Nov 23, 2014 at 4:05 PM, Thomas Gleixner <t...@linutronix.de>
wrote:
On Sun, 23 Nov 2014, Chris Mason wrote:
On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <b...@alien8.de>
wrote:
> On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
> > It must be:
> >
> > commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
> > Author: Stanislaw Gruszka <sgrus...@redhat.com>
> > Date: Wed Nov 12 16:58:44 2014 +0100
> >
> > sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency
> >
> > I'll do two runs to confirm, but it's the only related patch
between rc5
> > and
> > now.
I've adding Ingo and Stanislaw to the cc. With
6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer
crashing.
Repeating the stack trace for the new cc list. I see the crash
with atop or
similar walkers of /proc racing against exiting programs. Given
the NULL rip,
this line from the patch is probably broken, but it really feels
like we
should be falling over on p->sched_class and not on the
update_curr func.
+ p->sched_class->update_curr(rq);
I'm leaving my fork bomb running on two machines with the patch
reverted to
make sure.
The sched_class instances which do not have update_curr are stop_task
and idle. Patch below.
I'm sure nobody thought about the stats read code path here.
[ 1053.759741] [<ffffffff81208348>] do_task_stat+0x8b8/0xb00
do_task_stat(()
thread_group_cputime_adjusted()
thread_group_cputime()
task_cputime()
task_sched_runtime()
if (task_current(rq, p) && task_on_rq_queued(p)) {
update_rq_clock(rq);
p->sched_class->update_curr(rq);
}
Now if the stats are read for a stomp machine task, aka 'migration/N'
and that task is current on its cpu. Ooops.
I added the callback for idle tasks as well for completeness sake.
This does make sense, but it doesn't match with the crash being much
more likely during the fork bomb. The difference is crashing within
a few hours vs crashing within 5 minutes.
But, maybe I just got lucky. I'll try the patch.
11 minutes later and it's still alive. I'll keep an eye on it and yell
if it falls over.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/