Hi Chris, Hi Marcus, Just want to understand the cause, too. I'll try to sum it up.
Chris you have CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 and srun -C gpu -N 1 --ntasks-per-node=80 hostname works. Marcus has configured CPUs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 (slurmd -C says CPUs=96 Boards=1 SocketsPerBoard=4 CoresPerSocket=12 ThreadsPerCore=2) and CR_ONE_TASK_PER_CORE and srun -n 48 WORKS srun -N 1 --ntasks-per-node=48 DOESN'T WORK. I'm not sure if it's caused by CR_ONE_TASK_PER_CORE but at least that's one of the major differences. I'm wondering if the effort to force using only physical cores is doubled by removing the 48 Threads AND setting CR_ONE_TAKS_PER_CORE. My impression is that with CR_ONE_TASK_PER_CORE ntasks-per-node accounts for threads (you have set ThreadsPerCore=2), hence only 24 may work but CR_ONE_TASK_PER_CORE doen't affect the selection of 'cores only' with ntasks. We don't use CR_ONE_TASK_PER_CORE but our users either set -c 2 or --hint=nomultithread, which results in core-only. You could also enforce this with a job-submit-plugin or lua-plugin. The fact that CR_ONE_TASK_PER_CORE is described as "under changed" in the public bugs and that there is a non-accessible bug about this probably points to better not use this unless you have to. Best, Andreas On 2/20/19 7:49 AM, Chris Samuel wrote: > On Tuesday, 19 February 2019 10:14:21 PM PST Marcus Wagner wrote: > >> sbatch -N 1 --ntasks-per-node=48 --wrap hostname >> submission denied, got jobid 199805 > On one of our 40 core nodes with 2 hyperthreads: > > $ srun -C gpu -N 1 --ntasks-per-node=80 hostname | uniq -c > 80 nodename02 > > The spec is: > > CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 > > Hope this helps! > > All the best, > Chris -- Dr. Andreas Henkel Operativer Leiter HPC Zentrum für Datenverarbeitung Johannes Gutenberg Universität Anselm-Franz-von-Bentzelweg 12 55099 Mainz Telefon: +49 6131 39 26434 OpenPGP Fingerprint: FEC6 287B EFF3 7998 A141 03BA E2A9 089F 2D8E F37E
0xE2A9089F2D8EF37E.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature