Re: [slurm-users] Job allocating more CPUs than requested

Ryan Novosielski Fri, 21 Sep 2018 21:37:56 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/21/2018 11:22 PM, Chris Samuel wrote:
> On Saturday, 22 September 2018 2:53:58 AM AEST Nicolas Bock wrote:
> 
>> shows as requesting 1 CPU when in queue, but then allocates all 
>> CPU cores once running. Why is that?
> 
> Do you mean that Slurm expands the cores requested to all the cores
> on the node or allocates the node in exclusive mode, or do you mean
> that the code inside the job uses all the cores on the node instead
> of what was requested?
> 
> The latter is often the case for badly behaved codes and that's why
> using cgroups to contain applications is so important.


I apologize for potentially thread hijacking here, but it's in the
spirit of the original question I guess.

We constrain using cgroups, and occasionally someone will request 1
core (-n1 -c1) and then run something that asks for way more
cores/threads, or that tries to use the whole machine. They won't
succeed obviously. Is this any sort of problem? It seems to me that
trying to run 24 threads on a single core might generate some sort of
overhead, and that I/O could be increased, but I'm not sure. What I do
know is that if someone does this -- let's say in the extreme by
running something -n24 that itself tries to run 24 threads in each
task -- and someone uses the other 23 cores, you'll end up with a load
average near 24*24+23. Does this make any difference? We have NHC set
to offline such nodes, but that affects job preemption. What sort of
choices do others make in this area?

- -- 
 ____
 || \\UTGERS,     |----------------------*O*------------------------
 ||_// the State  |    Ryan Novosielski - novos...@rutgers.edu
 || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
 ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark
      `'
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAlulxpAACgkQmb+gadEcsb543gCeOnUj+raTuEjLdYe+rfmHDiPP
kfgAn0zY0Ykm3fEOha9P25Q4m0F0/yKQ
=kI8g
-----END PGP SIGNATURE-----

Re: [slurm-users] Job allocating more CPUs than requested

Reply via email to