On 30 March 2015 at 14:24, Peter Zijlstra <pet...@infradead.org> wrote: > On Mon, Mar 30, 2015 at 01:03:03PM +0100, Morten Rasmussen wrote: >> On Mon, Mar 30, 2015 at 12:06:32PM +0100, Peter Zijlstra wrote: >> > On Fri, Mar 27, 2015 at 05:56:51PM +0000, Morten Rasmussen wrote: >> > >> > > I agree that it is hard to predict how many additional cpus you need, >> > > but I don't think you necessarily need that information as long as you >> > > start by filling up the cpu that was kicked to do the >> > > nohz_idle_balance() first. >> > >> > > Reducing unnecessary wakeups is quite important for energy consumption >> > > and something a lot of effort is put into. You really don't want to wake >> > > up another cluster/package unnecessarily just because there was only one >> > > nohz-idle cpu left in the previous one which could have handled the >> > > additional load. It gets even worse if the other cluster is less >> > > energy-efficient (big.LITTLE). >> > >> > So the only way to get tasks to cross your cluster is by balancing that >> > domain. At this point we'll compute sg stats for either group >> > (=cluster). >> > >> > The only thing we need to ensure is that it doesn't view the small >> > cluster as overloaded (as long as it really isn't of course), as long as >> > its not viewed as overloaded it will not pull tasks from it into the big >> > cluster, no matter how many ILBs we run before the ILB duty cpu's >> > rebalance_domains() call. >> > >> > I'm really not seeing the problem here. >> >> I see. The group_classify() should take care of it in all cases of >> balancing across clusters. You would be iterating over all cpus in the >> other cluster running rebalance_domains() if the balancer cpu happens to >> be the last one in the little cluster though. However, within the >> cluster (in case you have 2 or more nohz-idle cpus) you still take a >> double hit. No? > > It can yes, but typically not I think. This all could use some 'help' > for sure. > > So the thing is, find_new_ilb() simply selects the first idle_cpus_mask > cpu, while at the same time, nohz_idle_balance() will iterate the > idle_cpus_mask with the first, being first (obviously). > > So it is very like that if we migrate on the ILB it is in fact to the > current CPU. > > In case we cannot, we have no choice but to wake up a second idle, > nothing really to be done about that. > > To put it another way, for ILB purposes the rebalance_domains() call is > mostly superfluous. The only other case is if the selected ILB target > became non-idle between being selected and getting to run the softirq > handler. At which point we should wake another anyhow too. > > Maybe something like the below helps -- albeit it could use a comment > too of course. > > --- > kernel/sched/fair.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index fdae26eb7218..b879d4b3b599 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7624,11 +7624,12 @@ static void rebalance_domains(struct rq *rq, enum > cpu_idle_type idle) > * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the > * rebalancing for all the cpus for whom scheduler ticks are stopped. > */ > -static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > +static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > { > int this_cpu = this_rq->cpu; > - struct rq *rq; > int balance_cpu; > + struct rq *rq; > + bool done = false; > > if (idle != CPU_IDLE || > !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu))) > @@ -7647,6 +7648,8 @@ static void nohz_idle_balance(struct rq *this_rq, enum > cpu_idle_type idle) > break; > > rq = cpu_rq(balance_cpu); > + if (rq == this_rq) > + done = true;
AFAICT, this can't happen because we start the for_each _cpu loop with: if (balance_cpu == this_cpu || !idle_cpu(balance_cpu)) continue; > > /* > * If time for next balance is due, > @@ -7666,6 +7669,8 @@ static void nohz_idle_balance(struct rq *this_rq, enum > cpu_idle_type idle) > nohz.next_balance = this_rq->next_balance; > end: > clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)); > + > + return done; > } > > /* > @@ -7744,7 +7749,7 @@ static inline bool nohz_kick_needed(struct rq *rq) > return kick; > } > #else > -static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { > } > +static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { > return false; } > #endif > > /* > @@ -7765,8 +7770,8 @@ static void run_rebalance_domains(struct softirq_action > *h) > * load balance only within the local sched_domain hierarchy > * and abort nohz_idle_balance altogether if we pull some load. > */ > - nohz_idle_balance(this_rq, idle); > - rebalance_domains(this_rq, idle); > + if (!nohz_idle_balance(this_rq, idle)) > + rebalance_domains(this_rq, idle); the nohz_idle_balance run rebalance_domains for all CPU except this CPU > } > > /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/