On 16 June 2016 at 22:07, Peter Zijlstra <pet...@infradead.org> wrote: > On Thu, Jun 16, 2016 at 09:00:57PM +0200, Vincent Guittot wrote: >> On 16 June 2016 at 20:51, Peter Zijlstra <pet...@infradead.org> wrote: >> > On Thu, Jun 16, 2016 at 06:30:13PM +0200, Vincent Guittot wrote: >> >> With patch [1] for the init of cfs_rq side, all use cases will be >> >> covered regarding the issue linked to a last_update_time set to 0 at >> >> init >> >> [1] https://lkml.org/lkml/2016/5/30/508 >> > >> > Aah, wait, now I get it :-) >> > >> > Still, we should put cfs_rq_clock_task(cfs_rq) in it, not 1. And since >> > we now acquire rq->lock on init this should well be possible. Lemme sort >> > that. >> >> yes with the rq->lock we can use cfs_rq_clock_task which is make more >> sense than 1. >> But the delta can be still significant between the creation of the >> task group and the 1st task that will be attach to the cfs_rq > > Ah, I think I've spotted more fail. > > And I think you're right, it doesn't matter, in fact, 0 should have been > fine too! > > enqueue_entity() > enqueue_entity_load_avg() > update_cfs_rq_load_avg() > now = clock() > __update_load_avg(&cfs_rq->avg) > cfs_rq->avg.last_load_update = now > // ages 0 load/util for: now - 0 > if (migrated) > attach_entity_load_avg() > se->avg.last_load_update = cfs_rq->avg.last_load_update; // now != 0 > > So I don't see how it can end up being attached again.
In fact it has already been attached during the sched_move_task. The sequence for the 1st task that is attached to a cfs_rq is : sched_move_task() task_move_group_fair() detach_task_cfs_rq() set_task_rq() attach_task_cfs_rq() attach_entity_load_avg() se->avg.last_load_update = cfs_rq->avg.last_load_update == 0 Then we enqueue the task but se->avg.last_load_update is still 0 so migrated is set and we attach the task one more time > > > Now I do see another problem, and that is that we're forgetting to > update_cfs_rq_load_avg() in all detach_entity_load_avg() callers and all > but the enqueue caller of attach_entity_load_avg(). Yes, calling update_cfs_rq_load_avg before all attach_entity_load_avg will ensure that cfs_rq->avg.last_load_update will never be 0 when attaching a task And doing that before the detach will ensure that we move an up-to-date load Your proposal below looks good to me > > Something like the below. > > > > --- > kernel/sched/fair.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index f75930bdd326..5d8fa135bbc5 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8349,6 +8349,7 @@ static void detach_task_cfs_rq(struct task_struct *p) > { > struct sched_entity *se = &p->se; > struct cfs_rq *cfs_rq = cfs_rq_of(se); > + u64 now = cfs_rq_clock_task(cfs_rq); > > if (!vruntime_normalized(p)) { > /* > @@ -8360,6 +8361,7 @@ static void detach_task_cfs_rq(struct task_struct *p) > } > > /* Catch up with the cfs_rq and remove our load when we leave */ > + update_cfs_rq_load_avg(now, cfs_rq, false); > detach_entity_load_avg(cfs_rq, se); > } > > @@ -8367,6 +8369,7 @@ static void attach_task_cfs_rq(struct task_struct *p) > { > struct sched_entity *se = &p->se; > struct cfs_rq *cfs_rq = cfs_rq_of(se); > + u64 now = cfs_rq_clock_task(cfs_rq); > > #ifdef CONFIG_FAIR_GROUP_SCHED > /* > @@ -8377,6 +8380,7 @@ static void attach_task_cfs_rq(struct task_struct *p) > #endif > > /* Synchronize task with its cfs_rq */ > + update_cfs_rq_load_avg(now, cfs_rq, false); > attach_entity_load_avg(cfs_rq, se); > > if (!vruntime_normalized(p))