On Sat, Dec 13, 2014 at 08:36:34AM +0100, Ingo Molnar wrote: > > * Linus Torvalds <torva...@linux-foundation.org> wrote: > > > I'm also not sure if the bug ever happens with preemption > > disabled. Sasha, was that you who reported that you cannot > > reproduce it without preemption? It strikes me that there's a > > race condition in __cond_resched() wrt preemption, for example: > > we do > > > > __preempt_count_add(PREEMPT_ACTIVE); > > __schedule(); > > __preempt_count_sub(PREEMPT_ACTIVE); > > > > and in between the __schedule() and __preempt_count_sub(), if > > an interrupt comes in and wakes up some important process, it > > won't reschedule (because preemption is active), but then we > > enable preemption again and don't check whether we should > > reschedule (again), and we just go on our merry ways. > > Indeed, that's a really good find regardless of whether it's the > source of these lockups - the (untested) patch below ought to > cure that. > > > Now, I don't see how that could really matter for a long time - > > returning to user space will check need_resched, and sleeping > > will obviously force a reschedule anyway, so these kinds of > > races should at most delay things by just a tiny amount, but > > maybe there is some case where we screw up in a bigger way. So > > I do *not* believe that the one in __cond_resched() matters, > > but I'm giving it as an example of the kind of things that > > could go wrong. > > (as you later note) NOHZ is somewhat special in this regard, > because there we try really hard not to run anything > periodically, so a lost reschedule will matter more. > > But ... I'd be surprised if this patch made a difference: it > should normally not be possible to go idle with tasks on the > runqueue (even with this bug present), and with at least one busy > task on the CPU we get the regular scheduler tick which ought to > hide such latencies. > > It's nevertheless a good thing to fix, I'm just not sure it's the > root cause of the observed lockup here. > > Thanks, > > Ingo > > -- > > Reported-by: Linus Torvalds <torva...@linux-foundation.org> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index bb398c0c5f08..532809aa0544 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4207,6 +4207,8 @@ static void __cond_resched(void) > __preempt_count_add(PREEMPT_ACTIVE); > __schedule(); > __preempt_count_sub(PREEMPT_ACTIVE); > + if (need_resched()) > + __schedule(); > }
Nice catch! This indeed matters a lot for full nohz where a lost reschedule interrupt might be ignored and not fixed with a near tick. Although even if it is fixed by a tick, a missed reschedule delayed by HZ involves latency issue. Anyway, probably the above __schedule() should stay as a preemption point to make sure that a TASK_[UN]INTERRUPTIBLE is handled as expected and avoids early task deactivation. Such as: Signed-off-by: Frederic Weisbecker <fweis...@gmail.com> --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 240157c..6e942f3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2922,6 +2922,21 @@ void __sched schedule_preempt_disabled(void) preempt_disable(); } +static void __preempt_schedule(void) +{ + do { + __preempt_count_add(PREEMPT_ACTIVE); + __schedule(); + __preempt_count_sub(PREEMPT_ACTIVE); + + /* + * Check again in case we missed a preemption opportunity + * between schedule and now. + */ + barrier(); + } while (need_resched()); +} + #ifdef CONFIG_PREEMPT /* * this is the entry point to schedule() from in-kernel preemption @@ -2937,17 +2952,7 @@ asmlinkage __visible void __sched notrace preempt_schedule(void) if (likely(!preemptible())) return; - do { - __preempt_count_add(PREEMPT_ACTIVE); - __schedule(); - __preempt_count_sub(PREEMPT_ACTIVE); - - /* - * Check again in case we missed a preemption opportunity - * between schedule and now. - */ - barrier(); - } while (need_resched()); + __preempt_schedule(); } NOKPROBE_SYMBOL(preempt_schedule); EXPORT_SYMBOL(preempt_schedule); @@ -4249,9 +4254,7 @@ SYSCALL_DEFINE0(sched_yield) static void __cond_resched(void) { - __preempt_count_add(PREEMPT_ACTIVE); - __schedule(); - __preempt_count_sub(PREEMPT_ACTIVE); + __preempt_schedule(); } int __sched _cond_resched(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/