On Wed, Oct 24, 2018 at 08:32:49AM +0530, Srikar Dronamraju wrote: > Load balancer and NUMA balancer are not suppose to work on isolcpus. > > Currently when setting sched affinity, there are no checks to see if the > requested cpumask has CPUs from both isolcpus and housekeeping CPUs. > > If user passes a mix of isolcpus and housekeeping CPUs, then > NUMA balancer can pick a isolcpu to schedule. > With this change, if a combination of isolcpus and housekeeping CPUs are > provided, then we restrict ourselves to housekeeping CPUs. > > For example: System with 32 CPUs > $ grep -o "isolcpus=[,,1-9]*" /proc/cmdline > isolcpus=1,5,9,13 > $ grep -i cpus_allowed /proc/$$/status > Cpus_allowed: ffffdddd > Cpus_allowed_list: 0,2-4,6-8,10-12,14-31 > > Running "perf bench numa mem --no-data_rand_walk -p 4 -t 8 -G 0 -P 3072 > -T 0 -l 50 -c -s 1000" which calls sched_setaffinity to all CPUs in > system. >
Forgive my naivety, but is it wrong for a process to bind to both isolated CPUs and housekeeping CPUs? It would certainly be a bit odd because the application is asking for some protection but no guarantees are given and the application is not made aware via an error code that there is a problem. Asking the application to parse dmesg hoping to find the right error message is going to be fragile. Would it be more appropriate to fail sched_setaffinity when there is a mix of isolated and housekeeping CPUs? In that case, an info message in dmesg may be appropriate as it'll likely be a once-off configuration error that's obvious due to an application failure. Alternatively, should NUMA balancing ignore isolated CPUs? The latter seems unusual as the application has specified a mask that allows those CPUs and it's not clear why NUMA balancing should ignore them. If anything, an application that wants to avoid all interference should also be using memory policies to bind to nodes so it behaves predictably with respect to access latencies (presumably if an application cannot tolerate kernel threads interfering then it also cannot tolerate remote access latencies) or disabling NUMA balancing entirely to avoid incurring minor faults. Thanks. -- Mel Gorman SUSE Labs