On 18 October 2016 at 23:58, Joonwoo Park <joonw...@codeaurora.org> wrote: > > > On 10/18/2016 04:56 AM, Vincent Guittot wrote: >> >> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit : >>> >>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote: >>>> >>>> On 18 October 2016 at 11:07, Peter Zijlstra <pet...@infradead.org> >>>> wrote: >>>>> >>>>> So aside from funny BIOSes, this should also show up when creating >>>>> cgroups when you have offlined a few CPUs, which is far more common I'd >>>>> think. >>>> >>>> >>>> The problem is also that the load of the tg->se[cpu] that represents >>>> the tg->cfs_rq[cpu] is initialized to 1024 in: >>>> alloc_fair_sched_group >>>> for_each_possible_cpu(i) { >>>> init_entity_runnable_average(se); >>>> sa->load_avg = scale_load_down(se->load.weight); >>>> >>>> Initializing sa->load_avg to 1024 for a newly created task makes >>>> sense as we don't know yet what will be its real load but i'm not sure >>>> that we have to do the same for se that represents a task group. This >>>> load should be initialized to 0 and it will increase when task will be >>>> moved/attached into task group >>> >>> >>> Yes, I think that makes sense, not sure how horrible that is with the >> >> >> That should not be that bad because this initial value is only useful for >> the few dozens of ms that follow the creation of the task group >> >>> >>> current state of things, but after your propagate patch, that >>> reinstates the interactivity hack that should work for sure. >> >> >> The patch below fixes the issue on my platform: >> >> Dietmar, Omer can you confirm that this fix the problem of your platform >> too ? > > > I just noticed this thread after posting > https://lkml.org/lkml/2016/10/18/719... > Noticed this bug while a ago and had the patch above at least a week but > unfortunately didn't have time to post... > I think Omer had same problem I was trying to fix and I believe patch I post > should address it. > > Vincent, your version fixes my test case as well.
Thanks for testing. Can i consider this as a Tested-by ? > This is sched_stat from the same test case I had in my changelog. > Note dd-2030 which is in root cgroup had same runtime as dd-2033 which is in > child cgroup. > > dd (2030, #threads: 1) > ------------------------------------------------------------------- > se.exec_start : 275700.024137 > se.vruntime : 10589.114654 > se.sum_exec_runtime : 1576.837993 > se.nr_migrations : 0 > nr_switches : 159 > nr_voluntary_switches : 0 > nr_involuntary_switches : 159 > se.load.weight : 1048576 > se.avg.load_sum : 48840575 > se.avg.util_sum : 19741820 > se.avg.load_avg : 1022 > se.avg.util_avg : 413 > se.avg.last_update_time : 275700024137 > policy : 0 > prio : 120 > clock-delta : 34 > dd (2033, #threads: 1) > ------------------------------------------------------------------- > se.exec_start : 275710.037178 > se.vruntime : 2383.802868 > se.sum_exec_runtime : 1576.547591 > se.nr_migrations : 0 > nr_switches : 162 > nr_voluntary_switches : 0 > nr_involuntary_switches : 162 > se.load.weight : 1048576 > se.avg.load_sum : 48316646 > se.avg.util_sum : 21235249 > se.avg.load_avg : 1011 > se.avg.util_avg : 444 > se.avg.last_update_time : 275710037178 > policy : 0 > prio : 120 > clock-delta : 36 > > Thanks, > Joonwoo > > >> >> --- >> kernel/sched/fair.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8b03fb5..89776ac 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity >> *se) >> * will definitely be update (after enqueue). >> */ >> sa->period_contrib = 1023; >> - sa->load_avg = scale_load_down(se->load.weight); >> + /* >> + * Tasks are intialized with full load to be seen as heavy task >> until >> + * they get a chance to stabilize to their real load level. >> + * group entity are intialized with null load to reflect the fact >> that >> + * nothing has been attached yet to the task group. >> + */ >> + if (entity_is_task(se)) >> + sa->load_avg = scale_load_down(se->load.weight); >> sa->load_sum = sa->load_avg * LOAD_AVG_MAX; >> /* >> * At this point, util_avg won't be used in select_task_rq_fair >> anyway >> >> >> >> >