On 1 September 2016 at 06:11, Mike Galbraith <umgwanakikb...@gmail.com> wrote: > On Wed, 2016-08-31 at 17:52 +0200, Vincent Guittot wrote: >> On 31 August 2016 at 12:36, Mike Galbraith <umgwanakikb...@gmail.com> wrote: >> > On Wed, 2016-08-31 at 12:18 +0200, Mike Galbraith wrote: >> > > On Wed, 2016-08-31 at 12:01 +0200, Peter Zijlstra wrote: >> > >> > > > So 43f4d66637bc ("sched: Improve sysbench performance by fixing >> > > > spurious >> > > > active migration") 's +1 made sense in that its a tie breaker. If you >> > > > have 3 tasks on 2 groups, one group will have to have 2 tasks, and >> > > > bouncing the one task around just isn't going to help _anything_. >> > > >> > > Yeah, but frequently tasks don't come in ones, so, you end up with an >> > > endless tug of war between LB ripping communicating buddies apart, and >> > > select_idle_sibling() pulling them back together.. bouncing cow >> > > syndrome. >> > >> >> replacing +1 by +2 fixes this use case that involves 2 threads but >> similar behavior can happen with 3 tasks on system with 4 cores per MC >> as an example >> >> IIUC, you have on >> - one side, periodic load balance that spreads the 2 tasks in the system >> - on the other side, wake up path that moves the task back in the same MC. > > Yup. > >> Isn't your regression more linked to spurious migration than where the >> task is scheduled ? I don't see any direct relation between the client >> and the server in this netperf test, isn't it ? > > netperf 4360 [004] 1207.865265: sched:sched_wakeup: > netserver:4361 [120] success=1 CPU:002 > netperf 4360 [004] 1207.865274: sched:sched_wakeup: > netserver:4361 [120] success=1 CPU:002 > netperf 4360 [004] 1207.865280: sched:sched_wakeup: > netserver:4361 [120] success=1 CPU:002 > netserver 4361 [002] 1207.865313: sched:sched_wakeup: > netperf:4360 [120] success=1 CPU:004 > netperf 4360 [004] 1207.865340: sched:sched_wakeup: > kworker/u16:4:89 [120] success=1 CPU:000 > netperf 4360 [004] 1207.865345: sched:sched_wakeup: > kworker/u16:5:90 [120] success=1 CPU:006 > netperf 4360 [004] 1207.865355: sched:sched_wakeup: > kworker/u16:5:90 [120] success=1 CPU:006 > netperf 4360 [004] 1207.865357: sched:sched_wakeup: > kworker/u16:4:89 [120] success=1 CPU:000 > netperf 4360 [004] 1207.865369: sched:sched_wakeup: > netserver:4361 [120] success=1 CPU:002 > netserver 4361 [002] 1207.865377: sched:sched_wakeup: > netperf:4360 [120] success=1 CPU:004 > netperf 4360 [004] 1207.865476: sched:sched_wakeup: > perf:4359 [120] success=1 CPU:003
I would have expected a net_rx softirq in the middle. Nevermind, i agree that we can find lot of use cases with communicating tasks > > It's not limited to this load, anything at all that is communicating > will do the same on these or similar processors. > > This trying to be perfect looks like a booboo to me, as we are now > specifically asking our left hand undo what our right hand did to crank > up throughput. For the diagnosed processor at least, one of those > hands definitely wants to be slapped. > > This doesn't seem to be an issue for L3 equipped CPUs, but perhaps is > for some even modern processors, dunno (the boxen where regression was > detected are far from new). > >> we could either remove the condition which tries to keep an even >> number of tasks in each group until busiest group becomes overloaded >> but it means that unrelated tasks may have to share same resources >> or we could try to prevent the migration at wake up. I was looking at >> wake_affine which seems to choose local cpu when both prev and local >> cpu are idle. I wonder if local cpu is really a better choice when >> both are idle > > I don't see a great alternative to turning it off off the top of my > head, at least for processors with multiple LLCs. Yeah, unrelated > tasks could end up sharing a cache needlessly, but will that hurt as > badly as tasks not munching tasty hot data definitely does? memory intensive task will probably be hurt > > -Mike