On 4 June 2013 13:19, Frederic Weisbecker <fweis...@gmail.com> wrote: > On Tue, Jun 04, 2013 at 01:11:47PM +0200, Vincent Guittot wrote: >> On 4 June 2013 12:26, Frederic Weisbecker <fweis...@gmail.com> wrote: >> > On Tue, Jun 04, 2013 at 11:36:11AM +0200, Peter Zijlstra wrote: >> >> >> >> The best I can seem to come up with is something like the below; but I >> >> think >> >> its ghastly. Surely we can do something saner with that bit. >> >> >> >> Having to clear it at 3 different places is just wrong. >> > >> > We could clear the flag early in scheduler_ipi() and set some >> > specific value in rq->idle_balance that tells we want nohz idle >> > balancing from the softirq, something like this untested: >> >> I'm not sure that we can have less than 2 places to clear it: cancel >> place or acknowledge place otherwise we can face a situation where >> idle load balance will be triggered 2 consecutive times because >> NOHZ_BALANCE_KICK will be cleared before the idle load balance has >> been done and had a chance to migrate tasks. > > I guess it depends what is the minimum value of rq->next_balance, it seems > to be large enough to avoid this kind of incident. Although I don't > know well the whole logic with rq->next_balance and ilb trigger so I must > defer to you.
In the trace that was showing the issue, i can see that both CPU0 and CPU1 were trying to trig ILB almost simultaneously and the test_and_set NOHZ_BALANCE_KICK filters one request so i would say that clearing the bit before the end of the idle load balance sequence can generate such sequence In the sequence below, i have minimized the clear of NOHZ_BALANCE_KICK in 2 places : acknowledge and cancel. I have reused part of the proposal from peter which clears the bit if the condition doesn't match but i have reordered the tests to done that only if all other condition are matching static inline bool got_nohz_idle_kick(void) { - int cpu = smp_processor_id(); - return idle_cpu(cpu) && test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); + bool nohz_kick = test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); + + if (!nohz_kick) + return false; + + if (idle_cpu(cpu) && !need_resched()) + return true; + + clear_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu)); + return false; } #else /* CONFIG_NO_HZ_COMMON */ @@ -1393,8 +1401,9 @@ static void sched_ttwu_pending(void) void scheduler_ipi(void) { - if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick() - && !tick_nohz_full_cpu(smp_processor_id())) + if (llist_empty(&this_rq()->wake_list) + && !tick_nohz_full_cpu(smp_processor_id()) + && !got_nohz_idle_kick()) return; /* @@ -1417,7 +1426,7 @@ void scheduler_ipi(void) /* * Check if someone kicked us for doing the nohz idle load balance. */ - if (unlikely(got_nohz_idle_kick() && !need_resched())) { + if (unlikely(got_nohz_idle_kick())) { this_rq()->idle_balance = 1; raise_softirq_irqoff(SCHED_SOFTIRQ); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/