Thanks for the responses. I think I didn't investigate deep enough - it appears 
that although I saw many processes running and a load average of something very 
high, the cgroups are indeed allocating the correct number of cores to the 
jobs, and threads are simply going to wait to run on the same cores/threads 
that were allocated.

I guess that when this happens, the load average in 'top' can show an extremely 
elevated number due to the fact that lots of processes are waiting to run - but 
in fact the overall availability of the node is still quite open as there are 
plenty of available cores left for other jobs. Would this be an accurate 
interpretation of the scheduling and load I'm observing? Are there impacts to 
the performance of the node when it is in this state?

Thanks everyone.


-----Original Message-----
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Chris Samuel
Sent: Friday, December 8, 2017 6:46 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] detectCores() mess

On 9/12/17 4:54 am, Mike Cammilleri wrote:

> I thought cgroups (which we are using) would prevent some of this 
> behavior on the nodes (we are constraining CPU and RAM) -I'd like 
> there to be no I/O wait times if possible. I would like it if either 
> linux or slurm could constrain a job from grabbing more cores than 
> assigned at submit time. Is there something else I should be 
> configuring to safeguard against this behavior? If SLURM assigns 1 cpu 
> to the task then no matter what craziness is in the code, 1 is all 
> they're getting. Possible?

That is exactly what cgroups does, a process within a cgroup that only has a 
single core available to it will only be able to use that one core.  If it 
fires up (for example) 8 threads or processes then they will all run, but they 
will all be contending for that single core.

You can check the cgroup for a process with:

cat /proc/$PID/cgroup

 From that you should be able to find the cgroup in the cpuset controller and 
see how many cores are available to it.

You mention I/O wait times, that's going to be separate to the number of cores 
available to a code, could you elaborate a little on what you are seeing there?

There is some support for this in current kernels, but I don't know when that 
landed and whether that will be in the kernel available to you.  Also I don't 
remember seeing mention for support for that in Slurm.

https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt

Best of luck,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

Reply via email to