Hello,

I have Slurm 17.11 installed on a 64 cores server. My 9 partitions are set with 
OverSubscribe=NO. I would expect that when all 64 cores are assigned to jobs, 
Slurm would just put new jobs in PENDING state. But it starts running new jobs 
so that more than 64 cores are assigned. Looking at the slurmctld log, we can 
see that cores 21, 22 and 24 to 38 are used in more than one partition right 
now:

[2018-04-16T15:00:00.439] node:katak cpus:64 c:8 s:8 t:1 mem:968986 
a_mem:231488 state:11
[2018-04-16T15:00:00.439] part:ibismini rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 6: bitmap: 4,6-12,16-33,48-55
[2018-04-16T15:00:00.439] part:ibisinter rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 1: bitmap: 24-41
[2018-04-16T15:00:00.439] part:ibismax rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 3: bitmap: 21-22,24-38,42-47,56-63
[2018-04-16T15:00:00.439] part:rclevesq rows:1 prio:10
[2018-04-16T15:00:00.439] part:ibis1 rows:1 prio:10
[2018-04-16T15:00:00.439] part:ibis2 rows:1 prio:10
[2018-04-16T15:00:00.439]   row0: num_jobs 1: bitmap: 32-37

So some jobs are now sharing the same cores but I don't understand why since 
OverSubscribe is set to no.

Thanks for your help!

---
Stéphane Larose
Analyste de l'informatique
Institut de Biologie Intégrative et des Systèmes (IBIS)
Pavillon Charles-Eugène-Marchand
Université Laval

Reply via email to