On Sat, Jan 05, 2013 at 08:37:46AM +0000, Alex Shi wrote: > If the wake/exec task is small enough, utils < 12.5%, it will > has the chance to be packed into a cpu which is busy but still has space to > handle it. > > Signed-off-by: Alex Shi <alex....@intel.com> > --- > kernel/sched/fair.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 45 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8d0d3af..0596e81 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct > sched_domain *sd, > } > > /* > + * find_leader_cpu - find the busiest but still has enough leisure time cpu > + * among the cpus in group. > + */ > +static int > +find_leader_cpu(struct sched_group *group, struct task_struct *p, int > this_cpu) > +{ > + unsigned vacancy, min_vacancy = UINT_MAX;
unsigned int? > + int idlest = -1; > + int i; > + /* percentage the task's util */ > + unsigned putil = p->se.avg.runnable_avg_sum * 100 > + / (p->se.avg.runnable_avg_period + 1); Alternatively you could use se.avg.load_avg_contrib which is the same ratio scaled by the task priority (se->load.weight). In the above expression you don't take priority into account. > + > + /* Traverse only the allowed CPUs */ > + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > + struct rq *rq = cpu_rq(i); > + int nr_running = rq->nr_running > 0 ? rq->nr_running : 1; > + > + /* only pack task which putil < 12.5% */ > + vacancy = FULL_UTIL - (rq->util * nr_running + putil * 8); I can't follow this expression. The variables can have the following values: FULL_UTIL = 99 rq->util = [0..99] nr_running = [1..inf] putil = [0..99] Why multiply rq->util by nr_running? Let's take an example where rq->util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq->util * nr_running is -1. However, with rq->util = 50 there should be plenty of spare cpu time to take another task. Also, why multiply putil by 8? rq->util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). The vacancy variable is declared unsigned, so it will underflow instead of becoming negative. Is this intentional? I may be missing something, but could the expression be something like the below instead? Create a putil < 12.5% check before the loop. There is no reason to recheck it every iteration. Then: vacancy = FULL_UTIL - (rq->util + putil) should be enough? > + > + /* bias toward local cpu */ > + if (vacancy > 0 && (i == this_cpu)) > + return i; > + > + if (vacancy > 0 && vacancy < min_vacancy) { > + min_vacancy = vacancy; > + idlest = i; "idlest" may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Morten > + } > + } > + return idlest; > +} > + > +/* > * If power policy is eligible for this domain, and it has task allowed cpu. > * we will select CPU from this domain. > */ > static int get_cpu_for_power_policy(struct sched_domain *sd, int cpu, > - struct task_struct *p, struct sd_lb_stats *sds) > + struct task_struct *p, struct sd_lb_stats *sds, int fork) > { > int policy; > int new_cpu = -1; > > policy = get_sd_sched_policy(sd, cpu, p, sds); > - if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) > - new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); > - > + if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) { > + if (!fork) > + new_cpu = find_leader_cpu(sds->group_leader, p, cpu); > + /* for fork balancing and a little busy task */ > + if (new_cpu == -1) > + new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); > + } > return new_cpu; > } > > @@ -3534,14 +3572,15 @@ select_task_rq_fair(struct task_struct *p, int > sd_flag, int flags) > if (tmp->flags & sd_flag) { > sd = tmp; > > - new_cpu = get_cpu_for_power_policy(sd, cpu, p, &sds); > + new_cpu = get_cpu_for_power_policy(sd, cpu, p, &sds, > + flags & SD_BALANCE_FORK); > if (new_cpu != -1) > goto unlock; > } > } > > if (affine_sd) { > - new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, &sds); > + new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, &sds, 0); > if (new_cpu != -1) > goto unlock; > > -- > 1.7.12 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/