On 04/07/2023 11:11, Tobias Huschle wrote: > On 2023-05-16 18:35, Dietmar Eggemann wrote: >> On 15/05/2023 13:46, Tobias Huschle wrote: >>> The current load balancer implementation implies that scheduler groups, >>> within the same scheduler domain, all host the same number of CPUs. >>> >>> This appears to be valid for non-s390 architectures. Nevertheless, s390 >>> can actually have scheduler groups of unequal size. >> >> Arm (classical) big.Little had this for years before we switched to flat >> scheduling (only MC sched domain) over CPU capacity boundaries for Arm >> DynamIQ. >> >> Arm64 Juno platform in mainline: >> >> root@juno:~# cat /sys/devices/system/cpu/cpu*/topology/cluster_cpus_list >> 0,3-5 >> 1-2 >> 1-2 >> 0,3-5 >> 0,3-5 >> 0,3-5 >> >> root@juno:~# cat /proc/schedstat | grep ^domain | awk '{print $1, $2}' >> >> domain0 39 <-- >> domain1 3f >> domain0 06 <-- >> domain1 3f >> domain0 06 >> domain1 3f >> domain0 39 >> domain1 3f >> domain0 39 >> domain1 3f >> domain0 39 >> domain1 3f >> >> root@juno:~# cat /sys/kernel/debug/sched/domains/cpu0/domain*/name >> MC >> DIE >> >> But we don't have SMT on the mobile processors. >> >> It looks like you are only interested to get group_weight dependency >> into this 'prefer_sibling' condition of find_busiest_group()? >> > Sorry, looks like your reply hit some bad filter of my mail program. > Let me answer, although it's a bit late. > > Yes, I would like to get the group_weight into the prefer_sibling path. > Unfortunately, we cannot go for a flat hierarchy as the s390 hardware > allows to have CPUs to be pretty far apart (cache-wise), which means, > the load balancer should avoid to move tasks back and forth between > those CPUs if possible. > > We can't remove SD_PREFER_SIBLING either, as this would cause the load > balancer to aim for having the same number of idle CPUs in all groups, > which is a problem as well in asymmetric groups, for example: > > With SD_PREFER_SIBLING, aiming for same number of non-idle CPUs > 00 01 02 03 04 05 06 07 08 09 10 11 || 12 13 14 15 > x x x x x x x x > > Without SD_PREFER_SIBLING, aiming for the same number of idle CPUs > 00 01 02 03 04 05 06 07 08 09 10 11 || 12 13 14 15 > x x x x x x x x > > > Hence the idea to add the group_weight to the prefer_sibling path. > > I was wondering if this would be the right place to address this issue > or if I should go down another route.
Yes, it's the right place to fix it for you. IMHO, there is still some discussion needed about the correct condition and changes in calculate_imbalance() for your case if I read the comments on this thread correctly. Arm64 big.Little wouldn't be affected since we explicitly remove SD_PREFER_SIBLING on MC for our legacy MC,DIE setups to avoid spreading tasks across DIE sched groups holding CPUs with different capacities. [...]