On 29.09.2015 19:00, Kirill Tkhai wrote: > > > On 29.09.2015 17:55, Mike Galbraith wrote: >> On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote: >> >>> --- >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 4df37a4..dfbe06b 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int >>> prev_cpu, int sd_flag, int wake_f >>> int want_affine = 0; >>> int sync = wake_flags & WF_SYNC; >>> >>> - if (sd_flag & SD_BALANCE_WAKE) >>> - want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, >>> tsk_cpus_allowed(p)); >>> + if (sd_flag & SD_BALANCE_WAKE) { >>> + want_affine = 1; >>> + if (cpu == prev_cpu || !cpumask_test_cpu(cpu, >>> tsk_cpus_allowed(p))) >>> + goto want_affine; >>> + if (wake_wide(p)) >>> + goto want_affine; >>> + } >> >> That blew wake_wide() right out of the water. >> >> It's not only about things like pgbench. Drive multiple tasks in a Xen >> guest (single event channel dom0 -> domu, and no select_idle_sibling() >> to save the day) via network, and watch workers fail to be all they can >> be because they keep being stacked up on the irq source. Load balancing >> yanks them apart, next irq stacks them right back up. I met that in >> enterprise land, thought wake_wide() should cure it, and indeed it did. > > 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of > current, doesn't it? We more often don't set affine_sd. So, the skipped > part of patch (skipped in quote) selects prev_cpu. > > 2)I thought about waking by irq handler and even was going to ask why > we use affine logic for such wakeups. Device handlers usually aren't > bound, timers may migrate since NO_HZ logic presents. The only explanation > I found is unbound timers is very unlikely case (I added statistics printk > to my local sched_debug to check that). But if we have the situations like > you described above, don't we have to disable affine logic for in_interrupt() > cases? > > 3)I ask about just because (being outside of scheduler history) it's a little > bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's > profit is less or more clear: smp_processor_id()'s sd_llc may contain some > data, which is interesting for a wakee, and this minimizes cache misses. > But we do the same in other cases too, and at every migration we loose > itlb, dtlb... Of course, it requires more accurate patches, then posted
***typo: instruction and data caches > (not so rude patches). > > Thanks, > Kirill > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/