-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/21/2018 11:22 PM, Chris Samuel wrote: > On Saturday, 22 September 2018 2:53:58 AM AEST Nicolas Bock wrote: > >> shows as requesting 1 CPU when in queue, but then allocates all >> CPU cores once running. Why is that? > > Do you mean that Slurm expands the cores requested to all the cores > on the node or allocates the node in exclusive mode, or do you mean > that the code inside the job uses all the cores on the node instead > of what was requested? > > The latter is often the case for badly behaved codes and that's why > using cgroups to contain applications is so important.
I apologize for potentially thread hijacking here, but it's in the spirit of the original question I guess. We constrain using cgroups, and occasionally someone will request 1 core (-n1 -c1) and then run something that asks for way more cores/threads, or that tries to use the whole machine. They won't succeed obviously. Is this any sort of problem? It seems to me that trying to run 24 threads on a single core might generate some sort of overhead, and that I/O could be increased, but I'm not sure. What I do know is that if someone does this -- let's say in the extreme by running something -n24 that itself tries to run 24 threads in each task -- and someone uses the other 23 cores, you'll end up with a load average near 24*24+23. Does this make any difference? We have NHC set to offline such nodes, but that affects job preemption. What sort of choices do others make in this area? - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAlulxpAACgkQmb+gadEcsb543gCeOnUj+raTuEjLdYe+rfmHDiPP kfgAn0zY0Ykm3fEOha9P25Q4m0F0/yKQ =kI8g -----END PGP SIGNATURE-----