On Fri, Dec 11, 2020 at 08:43:37PM +0000, Mel Gorman wrote: > One bug is in __select_idle_core() though. It's scanning the SMT mask, > not select_idle_mask so it can return an idle candidate that is not in > p->cpus_ptr.
D'0h.. luckily the benchmarks don't hit that :-) > There are some other potential caveats. > > This is a single pass so when test_idle_cores() is true, __select_idle_core > is used to to check all the siblings even if the core is not idle. That > could have been cut short if __select_idle_core checked *idle_cpu == > 1 and terminated the SMT scan if an idle candidate had already been found. So I did that on purpose, so as to track the last/most-recent idle cpu, with the thinking that that cpu has the higher chance of still being idle vs one we checked earlier/longer-ago. I suppose we benchmark both and see which is liked best. > Second downside is related. If test_idle_cpus() was true but no idle > CPU is found then __select_idle_core has been called enough to scan > the entire domain. In this corner case, the new code does *more* work > because the old code would have failed select_idle_core() quickly and > then select_idle_cpu() would be throttled by SIS_PROP. I think this will > only be noticable in the heavily overloaded case but if the corner case > hits enough then the new code will be slower than the old code for the > over-saturated case (i.e. hackbench with lots of groups). Right, due to scanning siblings, even if the first inspected thread is not idle, we scan more. > The third potential downside is that the SMT sibling is not guaranteed to > be checked due to SIS_PROP throttling but in the old code, that would have > been checked by select_idle_smt(). That might result in premature stacking > of runnable tasks on the same CPU. Similarly, as __select_idle_core may > find multiple idle candidates, it will not pick the targets SMT sibling > if it is idle like select_idle_smt would have. > > That said, I am skeptical that select_idle_smt() matters all that often. This, I didn't really believe in it either. The benchmarks I started are mostly noise, with a few wins for TCP_STREAM and UDP_RR around the 50% mark. Although I should run longer variants to make sure.