For what it's worth, we have a similar setup, with one crucial difference: we are handing out physical cores to jobs, not hyperthreads, and we are *not* seeing this behaviour:
$ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nn9999k -q devel echo foo srun: job 5371678 queued and waiting for resources srun: job 5371678 has been allocated resources foo $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nn9999k -q devel echo foo srun: job 5371680 queued and waiting for resources srun: job 5371680 has been allocated resources foo We have SelectType=select/cons_tres SelectTypeParameters=CR_CPU_Memory and node definitions like NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=182784 Gres=localscratch:330G Weight=1000 (so we set CPUs to the number of *physical cores*, not *hyperthreads*). -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature