On 06/08/2020 17:52, benbjiang(蒋彪) wrote:
> Hi, 
> 
>> On Aug 6, 2020, at 9:29 PM, Dietmar Eggemann <[email protected]> 
>> wrote:
>>
>> On 03/08/2020 13:26, benbjiang(蒋彪) wrote:
>>>
>>>
>>>> On Aug 3, 2020, at 4:16 PM, Dietmar Eggemann <[email protected]> 
>>>> wrote:
>>>>
>>>> On 01/08/2020 04:32, Jiang Biao wrote:
>>>>> From: Jiang Biao <[email protected]>

[...]

>> How would you deal with se's representing taskgroups which contain
>> SCHED_IDLE and SCHED_NORMAL tasks or other taskgroups doing that?
> I’m not sure I get the point. :) How about the following patch,
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fa8dbcfa4d..8715f03ed6d7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2994,6 +2994,9 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
>                 list_add(&se->group_node, &rq->cfs_tasks);
>         }
>  #endif
> +       if (task_has_idle_policy(task_of(se)))
> +               cfs_rq->idle_nr_running++;
> +
>         cfs_rq->nr_running++;
>  }
> 
> @@ -3007,6 +3010,9 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
>                 list_del_init(&se->group_node);
>         }
>  #endif
> +       if (task_has_idle_policy(task_of(se)))
> +               cfs_rq->idle_nr_running--;
> +
>         cfs_rq->nr_running--;
>  }
> 
> @@ -4527,7 +4533,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity 
> *curr, int queued)
>                 return;
>  #endif
> 
> -       if (cfs_rq->nr_running > 1)
> +       if (cfs_rq->nr_running > cfs_rq->idle_nr_running + 1 &&
> +           cfs_rq->h_nr_running - cfs_rq->idle_h_nr_running > 
> cfs_rq->idle_nr_running + 1)
>                 check_preempt_tick(cfs_rq, curr);
>  }
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 877fb08eb1b0..401090393e09 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -500,6 +500,7 @@ struct cfs_bandwidth { };
>  struct cfs_rq {
>         struct load_weight      load;
>         unsigned int            nr_running;
> +       unsigned int            idle_nr_running;
>         unsigned int            h_nr_running;      /* 
> SCHED_{NORMAL,BATCH,IDLE} */
>         unsigned int            idle_h_nr_running; /* SCHED_IDLE */

         /
       / |  \
      A  n0 i0
     / \
    n1 i1

I don't think this will work. E.g. the patch would prevent tick
preemption between 'A' and 'n0' on '/' as well

(3 > 1 + 1) && (4 - 2 > 1 + 1)

You also have to make sure that a SCHED_IDLE task can tick preempt
another SCHED_IDLE task.

>>> I’m not sure if it’s ok to do that, because the IDLE class seems not to be 
>>> so
>>> pure that could tolerate starving.
>>
>> Not sure I understand but idle_sched_class is not the same as SCHED_IDLE
>> (policy)?
> The case is that we need tasks(low priority, called offline tasks) to utilize 
> the
> spare cpu left by CFS SCHED_NORMAL tasks(called online tasks) without
> interfering the online tasks. 
> Offline tasks only run when there’s no runnable online tasks, and offline 
> tasks
> never preempt online tasks.
> The SCHED_IDLE policy seems not to be abled to be qualified for that 
> requirement,
> because it has a weight(3), even though it’s small, but it can still preempt 
> online
> tasks considering the fairness. In that way, offline tasks of SCHED_IDLE 
> policy
> could interfere the online tasks.

Because of this very small weight (weight=3), compared to a SCHED_NORMAL
nice 0 task (weight=1024), a SCHED_IDLE task is penalized by a huge
se->vruntime value (1024/3 higher than for a SCHED_NORMAL nice 0 task).
This should make sure it doesn't tick preempt a SCHED_NORMAL nice 0 task.

It's different when the SCHED_NORMAL task has nice 19 (weight=15) but
that's part of the CFS design.

> On the other hand, idle_sched_class seems not to be qualified either. It’s too
> simple and only used for per-cpu idle task currently.

Yeah, leave this for the rq->idle task (swapper/X).

>>> We need an absolutely low priority class that could tolerate starving, which
>>> could be used to co-locate offline tasks. But IDLE class seems to be not
>>> *low* enough, if considering the fairness of CFS, and IDLE class still has a
>>> weight.

Understood. But this (tick) preemption should happen extremely rarely,
especially if you have SCHED_NORMAL nice 0 tasks, right?

Reply via email to