On Wed, Oct 12, 2016 at 09:41:36AM +0200, Vincent Guittot wrote: > ok. In fact, I have noticed another regression with tip/sched/core and > hackbench while looking at yours. > I have bisect to : > 10e2f1acd0 ("sched/core: Rewrite and improve select_idle_siblings") > > hackbench -P -g 1 > > v4.8 tip/sched/core tip/sched/core+revert 10e2f1acd010 > and 1b568f0aabf2 > min 0.051 0,052 0.049 > avg 0.057(0%) 0,062(-7%) 0.056(+1%) > max 0.070 0,073 0.067 > stdev +/-8% +/-10% +/-9% > > The issue seems to be that it prevents some migration at wake up at > the end of hackbench test so we have last tasks that compete for the > same CPU whereas other CPUs are idle in the same MC domain. I haven't > to look more deeply which part of the patch do the regression yet
So select_idle_cpu(), which does the LLC wide CPU scan is now throttled by a comparison between avg_cost and avg_idle; where avg_cost is a historical measure of how costly it was to scan the entire LLC domain and avg_idle is our current idle time guestimate (also a historical average). The problem was that a number of workloads were spending quite a lot of time here scanning CPUs while they could be doing useful work (esp. since newer parts have silly amounts of CPUs per LLC). The toggle is a heuristic with a random number in.. we could see if there's anything better we can do. I know some people take the toggle out entirely, but that will regress other workloads.