On Thu, 2014-04-24 at 09:15 +0200, Peter Zijlstra wrote: > On Wed, Apr 23, 2014 at 06:30:35PM -0700, Jason Low wrote: > > It was found that when running some workloads (such as AIM7) on large > > systems > > with many cores, CPUs do not remain idle for long. Thus, tasks can > > wake/get enqueued while doing idle balancing. > > > > In this patch, while traversing the domains in idle balance, in addition to > > checking for pulled_task, we add an extra check for this_rq->nr_running for > > determining if we should stop searching for tasks to pull. If there are > > runnable tasks on this rq, then we will stop traversing the domains. This > > reduces the chance that idle balance delays a task from running. > > > > This patch resulted in approximately a 6% performance improvement when > > running a Java Server workload on an 8 socket machine. > > > > Signed-off-by: Jason Low <jason.l...@hp.com> > > --- > > kernel/sched/fair.c | 8 ++++++-- > > 1 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 3e3ffb8..232518c 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -6689,7 +6689,6 @@ static int idle_balance(struct rq *this_rq) > > if (sd->flags & SD_BALANCE_NEWIDLE) { > > t0 = sched_clock_cpu(this_cpu); > > > > - /* If we've pulled tasks over stop searching: */ > > pulled_task = load_balance(this_cpu, this_rq, > > sd, CPU_NEWLY_IDLE, > > &continue_balancing); > > @@ -6704,7 +6703,12 @@ static int idle_balance(struct rq *this_rq) > > interval = msecs_to_jiffies(sd->balance_interval); > > if (time_after(next_balance, sd->last_balance + interval)) > > next_balance = sd->last_balance + interval; > > - if (pulled_task) > > + > > + /* > > + * Stop searching for tasks to pull if there are > > + * now runnable tasks on this rq. > > + */ > > + if (pulled_task || this_rq->nr_running > 0) > > break; > > } > > rcu_read_unlock(); > > There's also the CONFIG_PREEMPT bit in move_tasks() does making that > unconditional also help such a workload?
If the below patch is what you were referring to, I believe this can help too. This was also something that I was testing out before we went with those patches which compares avg_idle with idle balance cost. I recall seeing somewhere around a +7% performance improvement in at least least 1 of the AIM7 workloads. I can do some more testing with this. --- diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 43232b8..d069054 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5304,7 +5304,6 @@ static int move_tasks(struct lb_env *env) pulled++; env->imbalance -= load; -#ifdef CONFIG_PREEMPT /* * NEWIDLE balancing is a source of latency, so preemptible * kernels will stop after the first task is pulled to minimize @@ -5312,7 +5311,6 @@ static int move_tasks(struct lb_env *env) */ if (env->idle == CPU_NEWLY_IDLE) break; -#endif /* * We only want to steal up to the prescribed amount of -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/