Re: [patch v3.18+ regression fix] sched: Further improve spurious CPU_IDLE active migrations

Vincent Guittot Thu, 01 Sep 2016 01:11:18 -0700

On 1 September 2016 at 06:11, Mike Galbraith <[email protected]> wrote:
> On Wed, 2016-08-31 at 17:52 +0200, Vincent Guittot wrote:
>> On 31 August 2016 at 12:36, Mike Galbraith <[email protected]> wrote:
>> > On Wed, 2016-08-31 at 12:18 +0200, Mike Galbraith wrote:
>> > > On Wed, 2016-08-31 at 12:01 +0200, Peter Zijlstra wrote:
>> >
>> > > > So 43f4d66637bc ("sched: Improve sysbench performance by fixing 
>> > > > spurious
>> > > > active migration") 's +1 made sense in that its a tie breaker. If you
>> > > > have 3 tasks on 2 groups, one group will have to have 2 tasks, and
>> > > > bouncing the one task around just isn't going to help _anything_.
>> > >
>> > > Yeah, but frequently tasks don't come in ones, so, you end up with an
>> > > endless tug of war between LB ripping communicating buddies apart, and
>> > > select_idle_sibling() pulling them back together.. bouncing cow
>> > > syndrome.
>> >
>>
>> replacing +1 by +2 fixes this use case that involves 2 threads but
>> similar behavior can happen with 3 tasks on system with 4 cores per MC
>> as an example
>>
>> IIUC, you have on
>> - one side, periodic load balance that spreads the 2 tasks in the system
>> - on the other side, wake up path that moves the task back in the same MC.
>
> Yup.
>
>> Isn't your regression more linked to spurious migration than where the
>> task is scheduled ? I don't see any direct relation between the client
>> and the server in this netperf test, isn't it ?
>
>          netperf  4360 [004]  1207.865265:       sched:sched_wakeup: 
> netserver:4361 [120] success=1 CPU:002
>          netperf  4360 [004]  1207.865274:       sched:sched_wakeup: 
> netserver:4361 [120] success=1 CPU:002
>          netperf  4360 [004]  1207.865280:       sched:sched_wakeup: 
> netserver:4361 [120] success=1 CPU:002
>        netserver  4361 [002]  1207.865313:       sched:sched_wakeup: 
> netperf:4360 [120] success=1 CPU:004
>          netperf  4360 [004]  1207.865340:       sched:sched_wakeup: 
> kworker/u16:4:89 [120] success=1 CPU:000
>          netperf  4360 [004]  1207.865345:       sched:sched_wakeup: 
> kworker/u16:5:90 [120] success=1 CPU:006
>          netperf  4360 [004]  1207.865355:       sched:sched_wakeup: 
> kworker/u16:5:90 [120] success=1 CPU:006
>          netperf  4360 [004]  1207.865357:       sched:sched_wakeup: 
> kworker/u16:4:89 [120] success=1 CPU:000
>          netperf  4360 [004]  1207.865369:       sched:sched_wakeup: 
> netserver:4361 [120] success=1 CPU:002
>        netserver  4361 [002]  1207.865377:       sched:sched_wakeup: 
> netperf:4360 [120] success=1 CPU:004
>          netperf  4360 [004]  1207.865476:       sched:sched_wakeup: 
> perf:4359 [120] success=1 CPU:003


I would have expected a net_rx softirq in the middle.
Nevermind, i agree that we can find lot of use cases with communicating tasks

>
> It's not limited to this load, anything at all that is communicating
> will do the same on these or similar processors.
>
> This trying to be perfect looks like a booboo to me, as we are now
> specifically asking our left hand undo what our right hand did to crank
> up throughput.  For the diagnosed processor at least, one of those
> hands definitely wants to be slapped.
>
> This doesn't seem to be an issue for L3 equipped CPUs, but perhaps is
> for some even modern processors, dunno (the boxen where regression was
> detected are far from new).
>
>> we could either remove the condition which tries to keep an even
>> number of tasks in each group until busiest group becomes overloaded
>> but it means that unrelated tasks may have to share same resources
>> or we could try to prevent the migration at wake up. I was looking at
>> wake_affine which seems to choose local cpu  when both prev and local
>> cpu are idle. I wonder if local cpu is  really a better choice when
>> both are idle
>
> I don't see a great alternative to turning it off off the top of my
> head, at least for processors with multiple LLCs.  Yeah, unrelated
> tasks could end up sharing a cache needlessly, but will that hurt as
> badly as tasks not munching tasty hot data definitely does?

memory intensive task will probably be hurt

>
>         -Mike

Re: [patch v3.18+ regression fix] sched: Further improve spurious CPU_IDLE active migrations

Reply via email to