On Tue, Apr 04, 2017 at 10:57:28PM +0530, Srikar Dronamraju wrote:
> When performing load balancing, numabalancing only looks at
> task->cpus_allowed to see if the task can run on the target cpu. If
> isolcpus kernel parameter is set, then isolated cpus will not be part of
> mask task->cpus_allowed.
> 
> For example: (On a Power 8 box running in smt 1 mode)
> 
> isolcpus=56,64,72,80,88
> 
> Cpus_allowed_list:    0-55,57-63,65-71,73-79,81-87,89-175
> /proc/20996/task/20996/status:Cpus_allowed_list:      
> 0-55,57-63,65-71,73-79,81-87,89-175
> /proc/20996/task/20997/status:Cpus_allowed_list:      
> 0-55,57-63,65-71,73-79,81-87,89-175
> /proc/20996/task/20998/status:Cpus_allowed_list:      
> 0-55,57-63,65-71,73-79,81-87,89-175
> 
> Note: offline cpus are excluded in cpus_allowed_list.
> 
> However a task might call sched_setaffinity() that includes all possible
> cpus in the system including the isolated cpus.
> 
> For example:
> perf bench numa mem --no-data_rand_walk -p 4 -t $THREADS -G 0 -P 3072 -T 0 -l 
> 50 -c -s 1000
> would call sched_setaffinity that resets the cpus_allowed mask.
> 
> Cpus_allowed_list:    0-55,57-63,65-71,73-79,81-87,89-175
> Cpus_allowed_list:    
> 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> Cpus_allowed_list:    
> 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> Cpus_allowed_list:    
> 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> Cpus_allowed_list:    
> 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152,160,168
> 
> The isolated cpus are part of the cpus allowed list. In the above case,
> numabalancing ends up scheduling some of these tasks on isolated cpus.
> 
> To avoid this, please check for isolated cpus before choosing a target
> cpu.
> 

Hmm, would this also prevent a task running inside a cgroup that is
allowed accessed to isolated CPUs from balancing? I severely doubt it
matters because if a process is isolated from interference then it
follows that automatic NUMA balancing should not be involved. If
anything the protection should be absolute but either way;

Acked-by: Mel Gorman <mgor...@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

Reply via email to