On 21 June 2016 at 15:17, Peter Zijlstra <pet...@infradead.org> wrote: > On Tue, Jun 21, 2016 at 10:41:19AM +0200, Peter Zijlstra wrote: >> On Mon, Jun 20, 2016 at 03:49:34PM +0100, Dietmar Eggemann wrote: >> > On 20/06/16 13:35, Vincent Guittot wrote: >> >> > > It will go through wake_up_new_task and post_init_entity_util_avg >> > > during its fork which is enough to set last_update_time. Then, it will >> > > use the switched_to_fair if the task becomes a fair one >> > >> > Oh I see. We want to make sure that every task (even when forked as >> > !fair) has a last_update_time value != 0, when becoming fair one day. >> >> Right, see 2 below. I need to write a bunch of comments explaining PELT >> proper, as well as document these things. >> >> The things we ran into with these patches were that: >> >> 1) You need to update the cfs_rq _before_ any entity attach/detach >> (and might need to update_tg_load_avg when update_cfs_rq_load_avg() >> returns true). >> >> 2) (fair) entities are always attached, switched_from/to deal with !fair. >> >> 3) cpu migration is the only exception and uses the last_update_time=0 >> thing -- because refusal to take second rq->lock. >> >> Which is why I dislike Yuyang's patches, they create more exceptions >> instead of applying existing rules (albeit undocumented). >> >> Esp. 1 is important, because while for mathematically consistency you >> don't actually need to do this, you only need the entities to be >> up-to-date with the cfs rq when you attach/detach, but that forgets the >> temporal aspect of _when_ you do this. > > I have the below for now, I'll continue poking at this for a bit. > > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -692,6 +692,7 @@ void init_entity_runnable_average(struct > > static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); > static int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool > update_freq); > +static void update_tg_load_avg(struct cfs_rq *cfs_rq, int force); > static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct > sched_entity *se); > > /* > @@ -757,7 +758,8 @@ void post_init_entity_util_avg(struct sc > } > } > > - update_cfs_rq_load_avg(now, cfs_rq, false); > + if (update_cfs_rq_load_avg(now, cfs_rq, false)) > + update_tg_load_avg(cfs_rq, false);
You should move update_tg_load_avg after attach_entity_load_avg to take into account the newly attached task > attach_entity_load_avg(cfs_rq, se); > } > > @@ -2919,7 +2921,21 @@ static inline void cfs_rq_util_change(st > WRITE_ONCE(*ptr, res); \ > } while (0) > > -/* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */ > +/** > + * update_cfs_rq_load_avg - update the cfs_rq's load/util averages > + * @now: current time, as per cfs_rq_clock_task() > + * @cfs_rq: cfs_rq to update > + * @update_freq: should we call cfs_rq_util_change() or will the call do so > + * > + * The cfs_rq avg is the direct sum of all its entities (blocked and > runnable) > + * avg. The immediate corollary is that all (fair) tasks must be attached, > see > + * post_init_entity_util_avg(). > + * > + * cfs_rq->avg is used for task_h_load() and update_cfs_share() for example. > + * > + * Returns true if the load decayed or we removed utilization. It is expected > + * that one calls update_tg_load_avg() on this condition. > + */ > static inline int > update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) > { > @@ -2974,6 +2990,14 @@ static inline void update_load_avg(struc > update_tg_load_avg(cfs_rq, 0); > } > > +/** > + * attach_entity_load_avg - attach this entity to its cfs_rq load avg > + * @cfs_rq: cfs_rq to attach to > + * @se: sched_entity to attach > + * > + * Must call update_cfs_rq_load_avg() before this, since we rely on > + * cfs_rq->avg.last_update_time being current. > + */ > static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct > sched_entity *se) > { > if (!sched_feat(ATTACH_AGE_LOAD)) > @@ -3005,6 +3029,14 @@ static void attach_entity_load_avg(struc > cfs_rq_util_change(cfs_rq); > } > > +/** > + * detach_entity_load_avg - detach this entity from its cfs_rq load avg > + * @cfs_rq: cfs_rq to detach from > + * @se: sched_entity to detach > + * > + * Must call update_cfs_rq_load_avg() before this, since we rely on > + * cfs_rq->avg.last_update_time being current. > + */ > static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct > sched_entity *se) > { > __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)), > @@ -8392,7 +8424,8 @@ static void detach_task_cfs_rq(struct ta > } > > /* Catch up with the cfs_rq and remove our load when we leave */ > - update_cfs_rq_load_avg(now, cfs_rq, false); > + if (update_cfs_rq_load_avg(now, cfs_rq, false)) > + update_tg_load_avg(cfs_rq, false); same as post_init_entity_util_avg. You should put it after the detach_entity_load_avg > detach_entity_load_avg(cfs_rq, se); > } > > @@ -8411,7 +8444,8 @@ static void attach_task_cfs_rq(struct ta > #endif > > /* Synchronize task with its cfs_rq */ > - update_cfs_rq_load_avg(now, cfs_rq, false); > + if (update_cfs_rq_load_avg(now, cfs_rq, false)) > + update_tg_load_avg(cfs_rq, false); same as post_init_entity_util_avg > attach_entity_load_avg(cfs_rq, se); > > if (!vruntime_normalized(p))