On 03/05/16 11:12, Peter Zijlstra wrote: > On Fri, Apr 29, 2016 at 08:32:41PM +0100, Dietmar Eggemann wrote: >> Avoid the need to add scaled_busy_load_per_task on both sides of the if >> condition to determine whether imbalance has to be set to >> busiest->load_per_task or not. >> >> The imbn variable was introduced with commit 2dd73a4f09be ("[PATCH] >> sched: implement smpnice") and the original if condition was >> >> if (max_load - this_load >= busiest_load_per_task * imbn) >> >> which over time changed into the current version where >> scaled_busy_load_per_task is to be found on both sides of >> the if condition. > > This appears to have started with: > > dd41f596cda0 ("sched: cfs core code") > > which for unexplained reasons does: > > - if (max_load - this_load >= busiest_load_per_task * imbn) { > + if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >= > + busiest_load_per_task * imbn) { > > > And later patches (by me) change that FUZZ into a variable metric, > because a fixed fuzz like that didn't at all work for the small loads > that result from cgroup tasks. > > > > Now fix_small_imbalance() always hurt my head; it originated in the > original sched_domain balancer from Nick which wasn't smpnice aware; and > lives on until today.
I see, all this code is already in the history.git kernel. > > Its purpose is to determine if moving one task over is beneficial. > However over time -- and smpnice started this -- the idea of _one_ task > became quite muddled. > > With the fine grained load accounting of today; does it even make sense > to ask this question? IOW. what does fix_small_imbalance() really gain > us -- other than a head-ache? So task priority breaks the assumption that 1 task is equivalent to SCHED_LOAD_SCALE and so does fine grained load accounting. fix_small_imbalance() is called twice from calculate_imbalance, if we would get rid of it, I don't know if we should bail out of lb in case the avg load values don't align nicely (busiest > sd avg > local) or just continue w/ lb. In the second case, where the imbalance value is raised (to busiest->load_per_task), we probably can just continue w/ lb, hoping that there is a task on the src rq which fits the smaller imbalance value.