Hi, > On Jul 24, 2020, at 10:05 AM, Li, Aubrey <aubrey...@linux.intel.com> wrote: > > On 2020/7/24 9:26, benbjiang(蒋彪) wrote: >> Hi, >> >>> On Jul 24, 2020, at 7:43 AM, Aubrey Li <aubrey.in...@gmail.com> wrote: >>> >>> On Thu, Jul 23, 2020 at 4:28 PM benbjiang(蒋彪) <benbji...@tencent.com> wrote: >>>> >>>> Hi, >>>> >>>>> On Jul 23, 2020, at 4:06 PM, Li, Aubrey <aubrey...@linux.intel.com> wrote: >>>>> >>>>> On 2020/7/23 15:47, benbjiang(蒋彪) wrote: >>>>>> Hi, >>>>>> >>>>>>> On Jul 23, 2020, at 1:39 PM, Li, Aubrey <aubrey...@linux.intel.com> >>>>>>> wrote: >>>>>>> >>>>>>> On 2020/7/23 12:23, benbjiang(蒋彪) wrote: >>>>>>>> Hi, >>>>>>>>> On Jul 23, 2020, at 11:35 AM, Li, Aubrey <aubrey...@linux.intel.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 2020/7/23 10:42, benbjiang(蒋彪) wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>>> On Jul 23, 2020, at 9:57 AM, Li, Aubrey <aubrey...@linux.intel.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On 2020/7/22 22:32, benbjiang(蒋彪) wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>>> On Jul 22, 2020, at 8:13 PM, Li, Aubrey >>>>>>>>>>>>> <aubrey...@linux.intel.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 2020/7/22 16:54, benbjiang(蒋彪) wrote: >>>>>>>>>>>>>> Hi, Aubrey, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai >>>>>>>>>>>>>>> <vpil...@digitalocean.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: Aubrey Li <aubrey...@intel.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Don't migrate if there is a cookie mismatch >>>>>>>>>>>>>>> Load balance tries to move task from busiest CPU to the >>>>>>>>>>>>>>> destination CPU. When core scheduling is enabled, if the >>>>>>>>>>>>>>> task's cookie does not match with the destination CPU's >>>>>>>>>>>>>>> core cookie, this task will be skipped by this CPU. This >>>>>>>>>>>>>>> mitigates the forced idle time on the destination CPU. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Select cookie matched idle CPU >>>>>>>>>>>>>>> In the fast path of task wakeup, select the first cookie matched >>>>>>>>>>>>>>> idle CPU instead of the first idle CPU. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Find cookie matched idlest CPU >>>>>>>>>>>>>>> In the slow path of task wakeup, find the idlest CPU whose core >>>>>>>>>>>>>>> cookie matches with task's cookie >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Don't migrate task if cookie not match >>>>>>>>>>>>>>> For the NUMA load balance, don't migrate task to the CPU whose >>>>>>>>>>>>>>> core cookie does not match with task's cookie >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Aubrey Li <aubrey...@linux.intel.com> >>>>>>>>>>>>>>> Signed-off-by: Tim Chen <tim.c.c...@linux.intel.com> >>>>>>>>>>>>>>> Signed-off-by: Vineeth Remanan Pillai <vpil...@digitalocean.com> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> kernel/sched/fair.c | 64 >>>>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++---- >>>>>>>>>>>>>>> kernel/sched/sched.h | 29 ++++++++++++++++++++ >>>>>>>>>>>>>>> 2 files changed, 88 insertions(+), 5 deletions(-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>>>>>>>>>>>> index d16939766361..33dc4bf01817 100644 >>>>>>>>>>>>>>> --- a/kernel/sched/fair.c >>>>>>>>>>>>>>> +++ b/kernel/sched/fair.c >>>>>>>>>>>>>>> @@ -2051,6 +2051,15 @@ static void task_numa_find_cpu(struct >>>>>>>>>>>>>>> task_numa_env *env, >>>>>>>>>>>>>>> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr)) >>>>>>>>>>>>>>> continue; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>>>>>> + /* >>>>>>>>>>>>>>> + * Skip this cpu if source task's cookie does not >>>>>>>>>>>>>>> match >>>>>>>>>>>>>>> + * with CPU's core cookie. >>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p)) >>>>>>>>>>>>>>> + continue; >>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> env->dst_cpu = cpu; >>>>>>>>>>>>>>> if (task_numa_compare(env, taskimp, groupimp, >>>>>>>>>>>>>>> maymove)) >>>>>>>>>>>>>>> break; >>>>>>>>>>>>>>> @@ -5963,11 +5972,17 @@ find_idlest_group_cpu(struct >>>>>>>>>>>>>>> sched_group *group, struct task_struct *p, int this >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /* Traverse only the allowed CPUs */ >>>>>>>>>>>>>>> for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { >>>>>>>>>>>>>>> + struct rq *rq = cpu_rq(i); >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>>>>>> + if (!sched_core_cookie_match(rq, p)) >>>>>>>>>>>>>>> + continue; >>>>>>>>>>>>>>> +#endif >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> if (sched_idle_cpu(i)) >>>>>>>>>>>>>>> return i; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (available_idle_cpu(i)) { >>>>>>>>>>>>>>> - struct rq *rq = cpu_rq(i); >>>>>>>>>>>>>>> struct cpuidle_state *idle = >>>>>>>>>>>>>>> idle_get_state(rq); >>>>>>>>>>>>>>> if (idle && idle->exit_latency < >>>>>>>>>>>>>>> min_exit_latency) { >>>>>>>>>>>>>>> /* >>>>>>>>>>>>>>> @@ -6224,8 +6239,18 @@ static int select_idle_cpu(struct >>>>>>>>>>>>>>> task_struct *p, struct sched_domain *sd, int t >>>>>>>>>>>>>>> for_each_cpu_wrap(cpu, cpus, target) { >>>>>>>>>>>>>>> if (!--nr) >>>>>>>>>>>>>>> return -1; >>>>>>>>>>>>>>> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) >>>>>>>>>>>>>>> - break; >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> +#ifdef CONFIG_SCHED_CORE >>>>>>>>>>>>>>> + /* >>>>>>>>>>>>>>> + * If Core Scheduling is enabled, select >>>>>>>>>>>>>>> this cpu >>>>>>>>>>>>>>> + * only if the process cookie matches core >>>>>>>>>>>>>>> cookie. >>>>>>>>>>>>>>> + */ >>>>>>>>>>>>>>> + if (sched_core_enabled(cpu_rq(cpu)) && >>>>>>>>>>>>>>> + p->core_cookie == >>>>>>>>>>>>>>> cpu_rq(cpu)->core->core_cookie) >>>>>>>>>>>>>> Why not also add similar logic in select_idle_smt to reduce >>>>>>>>>>>>>> forced-idle? :) >>>>>>>>>>>>> We hit select_idle_smt after we scaned the entire LLC domain for >>>>>>>>>>>>> idle cores >>>>>>>>>>>>> and idle cpus and failed,so IMHO, an idle smt is probably a good >>>>>>>>>>>>> choice under >>>>>>>>>>>>> this scenario. >>>>>>>>>>>> >>>>>>>>>>>> AFAIC, selecting idle sibling with unmatched cookie will cause >>>>>>>>>>>> unnecessary fored-idle, unfairness and latency, compared to >>>>>>>>>>>> choosing *target* cpu. >>>>>>>>>>> Choosing target cpu could increase the runnable task number on the >>>>>>>>>>> target runqueue, this >>>>>>>>>>> could trigger busiest->nr_running > 1 logic and makes the idle >>>>>>>>>>> sibling trying to pull but >>>>>>>>>>> not success(due to cookie not match). Putting task to the idle >>>>>>>>>>> sibling is relatively stable IMHO. >>>>>>>>>> >>>>>>>>>> I’m afraid that *unsuccessful* pullings between smts would not >>>>>>>>>> result in unstableness, because >>>>>>>>>> the load-balance always do periodicly , and unsuccess means nothing >>>>>>>>>> happen. >>>>>>>>> unsuccess pulling means more unnecessary overhead in load balance. >>>>>>>>> >>>>>>>>>> On the contrary, unmatched sibling tasks running concurrently could >>>>>>>>>> bring forced-idle to each other repeatedly, >>>>>>>>>> Which is more unstable, and more costly when pick_next_task for all >>>>>>>>>> siblings. >>>>>>>>> Not worse than two tasks ping-pong on the same target run queue I >>>>>>>>> guess, and better if >>>>>>>>> - task1(cookie A) is running on the target, and task2(cookie B) in >>>>>>>>> the runqueue, >>>>>>>>> - task3(cookie B) coming >>>>>>>>> >>>>>>>>> If task3 chooses target's sibling, it could have a chance to run >>>>>>>>> concurrently with task2. >>>>>>>>> But if task3 chooses target, it will wait for next pulling luck of >>>>>>>>> load balancer >>>>>>>> That’s more interesting. :) >>>>>>>> Distributing different cookie tasks onto different cpus(or cpusets) >>>>>>>> could be the *ideal stable status* we want, as I understood. >>>>>>>> Different cookie tasks running on sibling smts could hurt performance, >>>>>>>> and that should be avoided with best effort. >>>>>>> We already tried to avoid when we scan idle cores and idle cpus in llc >>>>>>> domain. >>>>>> >>>>>> I’m afraid that’s not enough either, :) >>>>>> 1. Scanning Idle cpus is not a full scan, there is limit according to >>>>>> scan cost. >>>>>> 2. That's only trying at the *core/cpu* level, *SMT* level should be >>>>>> considered too. >>>>>> >>>>>>> >>>>>>>> For above case, selecting idle sibling cpu can improve the concurrency >>>>>>>> indeed, but it decrease the imbalance for load-balancer. >>>>>>>> In that case, load-balancer could not notice the imbalance, and would >>>>>>>> do nothing to improve the unmatched situation. >>>>>>>> On the contrary, choosing the *target* cpu could enhance the >>>>>>>> imbalance, and load-balancer could try to pull unmatched task away, >>>>>>> Pulling away to where needs another bunch of elaboration. >>>>>> >>>>>> Still with the SMT2+3tasks case, >>>>>> if *idle sibling* chosen, >>>>>> Smt1’s load = task1+task2, smt2’s load = task3. Task3 will run >>>>>> intermittently because of forced-idle, >>>>>> so smt2’s real load could low enough, that it could not be pulled away >>>>>> forever. That’s indeed a stable state, >>>>>> but with performance at a discount. >>>>>> >>>>>> If *target sibling* chose, >>>>>> Smt1’s load = task1+task2+task3, smt2’s load=0. It’s a obvious >>>>>> imbalance, and load-balancer will pick a task to pull, >>>>>> 1. If task1(cookie A) picked, that’s done for good. >>>>>> 2. If task2(cookie B) or task3(cookie B) picked, that’s ok too, the rest >>>>>> task(cookie B) could be pulled away at next balance(maybe need to >>>>>> improve the pulling to tend to pull matched task more aggressively). >>>>>> And then, we may reach a more stable state *globally* without >>>>>> performance discount. >>>>> >>>>> I'm not sure what you mean pulled away, >>>> I mean pulled away by other cpus, may be triggered by idle balance or >>>> periodic balance on other cpus. >>>> >>>>> - if you mean pulled away from this core, cookieA in idle sibling case >>>>> can be >>>>> pulled away too. >>>> Yep, cookieA(task1) in idle sibling case could be pulled away, but >>>> cookieB(task3) on the smt2 could never get the chance being pulled >>>> away(unless being waken up). >>>> If cookieA(task1) failed being pulled(cookieB(task2) on smt1 may be pulled, >>>> 50% chance), cookieA(task1) and cookieB(task3) would reach the stable state >>>> with performance discount. >>>> >>> If you meant pulled away from this core, I didn't see how two cases are >>> different either. For example, when task2(cookieB) runs on SMT1, task3 >>> cookieb can be pulled to SMT2. and when task1(cookieA) switch onto SMT1, >>> task2(cookieB) can be pulled away by other cpus, too. >> That’s the case only if SMT2’s pulling happens when task2(cookieB) is running >> on SMT1, which depends on, >> 1. Smt2 not entering tickless or nohz_balancer_kick picks smt2 before other >> cpu’s pulling, may be unlikely. :) >> 2. Task1(cookieA) is not running on SMT1. >> otherwise it would be the case I described above. >> >> Besides, for other cases, like smt2+2task(CookieA+CookieB), picking *target* >> cpu instead of *idle sibling* could be more helpful to reach the global >> stable >> status(distribute different cookies onto different cores). >> If the task number of two cookies has a big difference, then distributing > different cookies onto different cores leads to a big imbalance, that state > may > be stable but not an optimal state, I guess that's why core run queue does > not refuse different cookies onto its rb tree. That’s the overcommit case, in which case distributing is not possible, and would fallback to the *local stable status*(as what current implementation does) too.
If the total task number is not overcommit, distributing could work. :) > > I think I understand your concern but IMHO I'm not convinced adding cookie > match > in idle SMT selection is a best choice, if you have some performance data of > your > workload, that will be very helpful to understand the case. Got it, I’ll try later. > > If distributing different cookies onto different cores is a hard requirement > from > your side, you are welcome to submit a patch to see others opinion. Thanks for your patience. If possible, could you please loop me in for future discussions about core-scheduling. Thx. Regard, Jiang > > Thanks, > -Aubrey