Hi Tao, On Fri, 8 May 2020 at 18:58, Tao Zhou <zohooou...@zoho.com.cn> wrote: > > On Fri, May 08, 2020 at 05:27:44PM +0200, Vincent Guittot wrote: > > On Fri, 8 May 2020 at 17:12, Tao Zhou <zohooou...@zoho.com.cn> wrote: > > > > > > Hi Phil, > > > > > > On Thu, May 07, 2020 at 04:36:12PM -0400, Phil Auld wrote: > > > > sched/fair: Fix enqueue_task_fair warning some more > > > > > > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair > > > > warning) > > > > did not fully resolve the issues with the rq->tmp_alone_branch != > > > > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case > > > > where > > > > the first for_each_sched_entity loop exits due to on_rq, having > > > > incompletely > > > > updated the list. In this case the second for_each_sched_entity loop > > > > can > > > > further modify se. The later code to fix up the list management fails > > > > to do > > > > what is needed because se no longer points to the sched_entity which > > > > broke > > > > out of the first loop. > > > > > > > > > > > Address this by calling leaf_add_rq_list if there are throttled parents > > > > while > > > > doing the second for_each_sched_entity loop. > > > > > > Thanks for your trace imformation and explanation. I > > > truely have learned from this and that. > > > > > > s/leaf_add_rq_list/list_add_leaf_cfs_rq/ > > > > > > > > > > > Suggested-by: Vincent Guittot <vincent.guit...@linaro.org> > > > > Signed-off-by: Phil Auld <pa...@redhat.com> > > > > Cc: Peter Zijlstra (Intel) <pet...@infradead.org> > > > > Cc: Vincent Guittot <vincent.guit...@linaro.org> > > > > Cc: Ingo Molnar <mi...@kernel.org> > > > > Cc: Juri Lelli <juri.le...@redhat.com> > > > > --- > > > > kernel/sched/fair.c | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index 02f323b85b6d..c6d57c334d51 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct > > > > task_struct *p, int flags) > > > > /* end evaluation on encountering a throttled cfs_rq */ > > > > if (cfs_rq_throttled(cfs_rq)) > > > > goto enqueue_throttle; > > > > + > > > > + /* > > > > + * One parent has been throttled and cfs_rq removed > > > > from the > > > > + * list. Add it back to not break the leaf list. > > > > + */ > > > > + if (throttled_hierarchy(cfs_rq)) > > > > + list_add_leaf_cfs_rq(cfs_rq); > > > > } > > > > > > I was confused by why the throttled cfs rq can be on list. > > > It is possible when enqueue a task and thanks to the 'threads'. > > > But I think the above comment does not truely put the right > > > intention, right ? > > > If throttled parent is onlist, the child cfs_rq is ignored > > > to be added to the leaf cfs_rq list me think. > > > > > > unthrottle_cfs_rq() follows the same logic if i am not wrong. > > > Is it necessary to add the above to it ? > > > > When a cfs_rq is throttled, its sched group is dequeued and all child > > cfs_rq are removed from leaf_cfs_rq list. But the sched group of the > > child cfs_rq stay enqueued in the throttled cfs_rq so child sched > > group->on_rq might be still set. > > If there is a throttle of throttle, and unthrottle the child throttled > cfs_rq(ugly): > ... > | > cfs_rq throttled (parent A) > | > | > cfs_rq in hierarchy (B) > | > | > cfs_rq throttled (C) > | > ... > > Then unthrottle the child throttled cfs_rq C, now the A is on the > leaf_cfs_rq list. sched_group entity of C is enqueued to B, and > sched_group entity of B is on_rq and is ignored by enqueue but in > the throttled hierarchy and not add to leaf_cfs_rq list. > The above may be absolutely wrong that I miss something.
several things: your example above is safe IMO because when C is unthrottle, It's group se will be enqueued on B which will be added to leaf_cfs_rq list. Then the group se of B is already on_rq but A is throttled and the 1st loop break. The 2nd loop will ensure that A is added to leaf_cfs_rq list Now, if we add one more level between C and A, we have a problem and we should add something similar in the else Finally, while checking the unthrottle_cfs_rq, the test if (!cfs_rq->load.weight) return" skips all the for_each_entity loop and can break the leaf_cfs_rq We need to jump to the last loop in such case > > Another thing : > In enqueue_task_fair(): > > for_each_sched_entity(se) { > cfs_rq = cfs_rq_of(se); > > if (list_add_leaf_cfs_rq(cfs_rq)) > break; > } > > In unthrottle_cfs_rq(): > > for_each_sched_entity(se) { > cfs_rq = cfs_rq_of(se); > > list_add_leaf_cfs_rq(cfs_rq); > } > > The difference between them is that if condition, add if > condition to unthrottle_cfs_rq() may be an optimization and > keep the same. Yes we can do the same kind of optimization > > > > > > > Thanks, > > > Tau > > > > > > > > > > > enqueue_throttle: > > > > -- > > > > 2.18.0 > > > > > > > > V2 rework the fix based on Vincent's suggestion. Thanks Vincent. > > > > > > > > > > > > Cheers, > > > > Phil > > > > > > > > -- > > > >