On 01/16/2013 01:43 PM, Alex Shi wrote: > - /* while loop will break here if sd == NULL */ >>> >>> I agree that this should be a major optimization. I just can't figure >>> out why the existing recursive search for an idle cpu switches to the >>> new cpu near the end and then starts a search for an idle cpu in the new >>> cpu's domain. Is this to handle some exotic sched domain configurations? >>> If so, they probably wouldn't work with your optimizations. >> >> Let me explain my understanding of why the recursive search is the way >> it is. >> >> _________________________ sd0 >> | | >> | ___sd1__ ___sd2__ | >> | | | | | | >> | | sgx | | sga | | >> | | sgy | | sgb | | >> | |________| |________| | >> |_________________________| >> >> What the current recursive search is doing is (assuming we start with >> sd0-the top level sched domain whose flags are rightly set). we find >> that sd1 is the idlest group,and a cpux1 in sgx is the idlest cpu. >> >> We could have ideally stopped the search here.But the problem with this >> is that there is a possibility that sgx is more loaded than sgy; meaning >> the cpus in sgx are heavily imbalanced;say there are two cpus cpux1 and >> cpux2 in sgx,where cpux2 is heavily loaded and cpux1 has recently gotten >> idle and load balancing has not come to its rescue yet.According to the >> search above, cpux1 is idle,but is *not the right candidate for >> scheduling forked task,it is the right candidate for relieving the load >> from cpux2* due to cache locality etc. > > The problem still exists on the current code. It still goes to cpux1. > and then goes up to sgx to seek idlest group ... idlest cpu, and back to > cpux1 again. nothing help. > >
to resolve the problem, I has tried to walk domains from top down. but testing show aim9/hackbench performance is not good on our SNB EP. and no change on other platforms. --- @@ -3351,51 +3363,33 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) - while (sd) { + for_each_lower_domain(sd) { int load_idx = sd->forkexec_idx; - struct sched_group *group; - int weight; - - if (!(sd->flags & sd_flag)) { - sd = sd->child; - continue; - } + int local = 0; if (sd_flag & SD_BALANCE_WAKE) load_idx = sd->wake_idx; - group = find_idlest_group(sd, p, cpu, load_idx); - if (!group) { - sd = sd->child; - continue; - } - - new_cpu = find_idlest_cpu(group, p, cpu); - if (new_cpu == -1 || new_cpu == cpu) { - /* Now try balancing at a lower domain level of cpu */ - sd = sd->child; + group = find_idlest_group(sd, p, cpu, load_idx, &local); + if (local) continue; - } + if (!group) + goto unlock; - /* Now try balancing at a lower domain level of new_cpu */ - cpu = new_cpu; - weight = sd->span_weight; - sd = NULL; - for_each_domain(cpu, tmp) { - if (weight <= tmp->span_weight) - break; - if (tmp->flags & sd_flag) + /* go down from non-local group */ + for_each_domain(group_first_cpu(group), tmp) + if (cpumask_equal(sched_domain_span(tmp), + sched_group_cpus(group))) { sd = tmp; - } - /* while loop will break here if sd == NULL */ + break; + } } + if (group) + new_cpu = find_idlest_cpu(group, p, cpu); unlock: rcu_read_unlock(); -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/