Hi,

We have a strange behaviour of Slurm after updating from 18.08.7 to 18.08.8, 
for jobs using --exclusive and --mem-per-cpu.

Our nodes have 128GB of memory, 28 cores.
        $ srun  --mem-per-cpu=30000 -n 1  --exclusive  hostname
=> works in 18.08.7 
=> doesn’t work in 18.08.8

In 18.08.8 :
- If mem-per-cpu of lower to (full_memory_size_of_node/nb_core_per_node), it 
works fine (so lower to 4681MB).
- if mem-per-cpu of upper, the job stays pending while the starting date is to 
now. In slurmctld logs, we see error "backfill: Failed to start JobId=xxxx with 
reserve avail: Requested nodes are busy” every 30s : so slurmctld tries to 
start it again and again.
- If I use --exclusive=user, it works.

On an other cluster, I also tried on a 19.05.2 version : I have the same 
behaviour.
In slurm-19.05.3 version : the job is refused with the error : “srun: error: 
Unable to allocate resources: Requested node configuration is not available”

I can’t upgrade my production cluster to 19 version…  Will it be a patch for 18 
version ?

We have a workaround by using --exclusive, --ntasks-per-node and (--ntasks or 
—nodes). 
But sometime, in depopulating mode, asking only ntasks and mem-per-cpu with 
exclusive allow to change easily a job by increasing the memory per task 
without knowing the memory size of the node : slurm calculate how many tasks 
are distributed on the right number of nodes...

Is this new behaviour was intentional ? I can’t find anything about it in 
release notes (except the patch for 19.05.3).

We have academics and non-academic user on the same cluster, so non-academic 
users ask of --exclusive.

Thank you in advance for your help,
Sincerely,

        Béatrice

-- 
Béatrice CHARTON                |              CRIANN
beatrice.char...@criann.fr      |  745, avenue de l'Université
Tel : +33 (0)2 32 91 42 91      | 76800 Saint Etienne du Rouvray
       ---   Support : supp...@criann.fr   ---

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to