Changelog since v2 o Build fix for !NUMA_BALANCING configurations Changelog since v1 o Split out patch that moves imbalance calculation o Strongly connect fork imbalance considerations with adjust_numa_imbalance
When NUMA and CPU balancing were reconciled, there was an attempt to allow a degree of imbalance but it caused more problems than it solved. Instead, imbalance was only allowed with an almost idle NUMA domain. A lot of the problems have since been addressed so it's time for a revisit. There is also an issue with how fork is balanced across threads. It's mentioned in this context as patch 3 and 4 should share similar behaviour in terms of a nodes utilisation. Patch 1 is just a cosmetic rename Patch 2 moves an imbalance calculation. It is both a micro-optimisation and avoids confusing what imbalance means for different group types. Patch 3 allows a "floating" imbalance to exist so communicating tasks can remain on the same domain until utilisation is higher. It aims to balance compute availability with memory bandwidth. Patch 4 is the interesting one. Currently fork can allow a NUMA node to be completely utilised as long as there are idle CPUs until the load balancer gets involved. This caused serious problems with a real workload that unfortunately I cannot share many details about but there is a proxy reproducer. -- 2.26.2 Mel Gorman (4): sched/numa: Rename nr_running and break out the magic number sched: Avoid unnecessary calculation of load imbalance at clone time sched/numa: Allow a floating imbalance between NUMA nodes sched: Limit the amount of NUMA imbalance that can exist at fork time kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++++------------- 1 file changed, 31 insertions(+), 13 deletions(-) -- 2.26.2