Hi, > On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann <dietmar.eggem...@arm.com> wrote: > > On 03/08/2020 13:26, benbjiang(蒋彪) wrote: >> >> >>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <dietmar.eggem...@arm.com> >>> wrote: >>> >>> On 01/08/2020 04:32, Jiang Biao wrote: >>>> From: Jiang Biao <benbji...@tencent.com> >>>> >>>> No need to preempt when there are only one runable CFS task with >>>> other IDLE tasks on runqueue. The only one CFS task would always >>>> be picked in that case. >>>> >>>> Signed-off-by: Jiang Biao <benbji...@tencent.com> >>>> --- >>>> kernel/sched/fair.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>> index 04fa8dbcfa4d..8fb80636b010 100644 >>>> --- a/kernel/sched/fair.c >>>> +++ b/kernel/sched/fair.c >>>> @@ -4527,7 +4527,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct >>>> sched_entity *curr, int queued) >>>> return; >>>> #endif >>>> >>>> - if (cfs_rq->nr_running > 1) >>>> + if (cfs_rq->nr_running > cfs_rq.idle_h_nr_running + 1) >>> >>> cfs_rq is a pointer. >> It is. Sorry about that. :) >> >>> >>>> check_preempt_tick(cfs_rq, curr); >>>> } >>> >>> You can't compare cfs_rq->nr_running with cfs_rq->idle_h_nr_running! >>> >>> There is a difference between cfs_rq->h_nr_running and >>> cfs_rq->nr_running. The '_h_' stands for hierarchical. >>> >>> The former gives you hierarchical task accounting whereas the latter is >>> the number of sched entities (representing tasks or taskgroups) enqueued >>> in cfs_rq. >>> >>> In entity_tick(), cfs_rq->nr_running has to be used for the condition to >>> call check_preempt_tick(). We want to check if curr has to be preempted >>> by __pick_first_entity(cfs_rq) on this cfs_rq. >>> >>> entity_tick() is called for each sched entity (and so for each >>> cfs_rq_of(se)) of the task group hierarchy (e.g. task p running in >>> taskgroup /A/B : se(p) -> se(A/B) -> se(A)). >> That’s true. I was thinking adding a new cfs_rq->idle_nr_running member to >> track the per cfs_rq's IDLE task number, and reducing preemption here based >> on that. > > How would you deal with se's representing taskgroups which contain > SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that? I’m not sure I get the point. :) How about the following patch,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 04fa8dbcfa4d..8715f03ed6d7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se) list_add(&se->group_node, &rq->cfs_tasks); } #endif + if (task_has_idle_policy(task_of(se))) + cfs_rq->idle_nr_running++; + cfs_rq->nr_running++; } @@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se) list_del_init(&se->group_node); } #endif + if (task_has_idle_policy(task_of(se))) + cfs_rq->idle_nr_running--; + cfs_rq->nr_running--; } @@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) return; #endif - if (cfs_rq->nr_running > 1) + if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 && + cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > cfs_rq->idle_nr_running + 1) check_preempt_tick(cfs_rq, curr); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 877fb08eb1b0..401090393e09 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -500,6 +500,7 @@ struct cfs_bandwidth { }; struct cfs_rq { struct load_weight load; unsigned int nr_running; + unsigned int idle_nr_running; unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */ unsigned int idle_h_nr_running; /* SCHED_IDLE */ > >> I’m not sure if it’s ok to do that, because the IDLE class seems not to be so >> pure that could tolerate starving. > > Not sure I understand but idle_sched_class is not the same as SCHED_IDLE > (policy)? The case is that we need tasks(low priority, called offline tasks) to utilize the spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without interfering the online tasks. Offline tasks only run when there’s no runnable online tasks, and offline tasks never preempt online tasks. The SCHED_IDLE policy seems not to be abled to be qualified for that requirement, because it has a weight(3), even though it’s small, but it can still preempt online tasks considering the fairness. In that way, offline tasks of SCHED_IDLE policy could interfere the online tasks. On the other hand, idle_sched_class seems not to be qualified either. It’s too simple and only used for per-cpu idle task currently. Thx. Regards, Jiang > >> We need an absolutely low priority class that could tolerate starving, which >> could be used to co-locate offline tasks. But IDLE class seems to be not >> *low* enough, if considering the fairness of CFS, and IDLE class still has a >> weight. > > [...] >