Beatrice Charton <beatrice.char...@criann.fr> writes: > Hi, > > We have a strange behaviour of Slurm after updating from 18.08.7 to > 18.08.8, for jobs using --exclusive and --mem-per-cpu. > > Our nodes have 128GB of memory, 28 cores. > $ srun --mem-per-cpu=30000 -n 1 --exclusive hostname > => works in 18.08.7 > => doesn’t work in 18.08.8
I'm actually surprised it _worked_ in 18.08.7. At one time - long before v 18.08, the behaviour was changed when using --exclusive: In order to account the job for all cpus on the node, the number of cpus asked for with --ntasks would simply be multiplied with with "#cpus-on-node / --ntasks" (so in your case: 28). Unfortunately, that also means that the memory the job requires per node is "#cpus-on-node / --ntasks" multiplied with --mem-per-cpu (in your case 28 * 30000 MiB ~= 820 GiB). For this reason, we tend to ban --exclusive on our clusters (or at least warn about it). I haven't looked at the code for a long time, so I don't know whether this is still the current behaviour, but every time I've tested, I've seen the same problem. I believe I've tested on 19.05 (but I might remember wrong). -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature