Hi, We have a strange behaviour of Slurm after updating from 18.08.7 to 18.08.8, for jobs using --exclusive and --mem-per-cpu.
Our nodes have 128GB of memory, 28 cores. $ srun --mem-per-cpu=30000 -n 1 --exclusive hostname => works in 18.08.7 => doesn’t work in 18.08.8 In 18.08.8 : - If mem-per-cpu of lower to (full_memory_size_of_node/nb_core_per_node), it works fine (so lower to 4681MB). - if mem-per-cpu of upper, the job stays pending while the starting date is to now. In slurmctld logs, we see error "backfill: Failed to start JobId=xxxx with reserve avail: Requested nodes are busy” every 30s : so slurmctld tries to start it again and again. - If I use --exclusive=user, it works. On an other cluster, I also tried on a 19.05.2 version : I have the same behaviour. In slurm-19.05.3 version : the job is refused with the error : “srun: error: Unable to allocate resources: Requested node configuration is not available” I can’t upgrade my production cluster to 19 version… Will it be a patch for 18 version ? We have a workaround by using --exclusive, --ntasks-per-node and (--ntasks or —nodes). But sometime, in depopulating mode, asking only ntasks and mem-per-cpu with exclusive allow to change easily a job by increasing the memory per task without knowing the memory size of the node : slurm calculate how many tasks are distributed on the right number of nodes... Is this new behaviour was intentional ? I can’t find anything about it in release notes (except the patch for 19.05.3). We have academics and non-academic user on the same cluster, so non-academic users ask of --exclusive. Thank you in advance for your help, Sincerely, Béatrice -- Béatrice CHARTON | CRIANN beatrice.char...@criann.fr | 745, avenue de l'Université Tel : +33 (0)2 32 91 42 91 | 76800 Saint Etienne du Rouvray --- Support : supp...@criann.fr ---
smime.p7s
Description: S/MIME cryptographic signature