Re: [PATCH] sched/fair: Skip wake_affine() for core siblings

Kirill Tkhai Tue, 29 Sep 2015 09:15:57 -0700


On 29.09.2015 19:00, Kirill Tkhai wrote:
> 
> 
> On 29.09.2015 17:55, Mike Galbraith wrote:
>> On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote:
>>
>>> ---
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 4df37a4..dfbe06b 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int 
>>> prev_cpu, int sd_flag, int wake_f
>>>     int want_affine = 0;
>>>     int sync = wake_flags & WF_SYNC;
>>>  
>>> -   if (sd_flag & SD_BALANCE_WAKE)
>>> -           want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, 
>>> tsk_cpus_allowed(p));
>>> +   if (sd_flag & SD_BALANCE_WAKE) {
>>> +           want_affine = 1;
>>> +           if (cpu == prev_cpu || !cpumask_test_cpu(cpu, 
>>> tsk_cpus_allowed(p)))
>>> +                   goto want_affine;
>>> +           if (wake_wide(p))
>>> +                   goto want_affine;
>>> +   }
>>
>> That blew wake_wide() right out of the water.
>>
>> It's not only about things like pgbench.  Drive multiple tasks in a Xen
>> guest (single event channel dom0 -> domu, and no select_idle_sibling()
>> to save the day) via network, and watch workers fail to be all they can
>> be because they keep being stacked up on the irq source.  Load balancing
>> yanks them apart, next irq stacks them right back up.  I met that in
>> enterprise land, thought wake_wide() should cure it, and indeed it did.
> 
> 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of
> current, doesn't it? We more often don't set affine_sd. So, the skipped
> part of patch (skipped in quote) selects prev_cpu.
> 
> 2)I thought about waking by irq handler and even was going to ask why
> we use affine logic for such wakeups. Device handlers usually aren't
> bound, timers may migrate since NO_HZ logic presents. The only explanation
> I found is unbound timers is very unlikely case (I added statistics printk
> to my local sched_debug to check that). But if we have the situations like
> you described above, don't we have to disable affine logic for in_interrupt()
> cases?
> 
> 3)I ask about just because (being outside of scheduler history) it's a little
> bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's
> profit is less or more clear: smp_processor_id()'s sd_llc may contain some
> data, which is interesting for a wakee, and this minimizes cache misses.
> But we do the same in other cases too, and at every migration we loose
> itlb, dtlb... Of course, it requires more accurate patches, then posted


***typo: instruction and data caches

> (not so rude patches).
> 
> Thanks,
> Kirill
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched/fair: Skip wake_affine() for core siblings

Reply via email to