1 server, 2 sockets, 22 cores each, 4 hyperthreads --> 2*22*4=176 "CPUTot" as reported by "scontrol show node"
> I think you should give the relevant node and partition lines from your slurm.conf.
I found the following in node.conf: NodeName=taurusml[1-32] Feature=IB Gres=gpu:6 Procs=176 Sockets=2 CoresPerSocket=22 ThreadsPerCore=4 RealMemory=254000 State=UNKNOWN Weight=128
> Which Slurm version do you run? 19.05.5> The whypending tool does not appear in a google search. Where did you get it from and what does it do?
It seems to be a Python script showing why a job is pending. It uses pyslurm. I thought it was a slurm tool, but might be some custom thing
> >Most importantly: Does this mean `--cpus-per-task` can be as high as 176 on this node and `--mem-per-cpu` can be up to the reported "RealMemory"/176?
> Yes.> This is just historical as far as I can tell. I think 'CPU' almost always means 'core'.
I just tried a very simple example with 1 task and `--cpus-per-task=50` (slightly higher than the 44 physical cores) and it failed with "Requested node configuration is not available"
So in summary: "CPU" for the srun/sbatch/salloc means "(physical) core". "CPU" as for scontrol (and pyslurm which seems to wrap this) means "Thread". This is confusing but at least the question seems to be answered now.
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Alexander Grund Interdisziplinäre Anwendungsunterstützung und Koordination (IAK) Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) Würzburger Str.35/Chemnitzer Str.50, Raum 010 01062 Dresden Tel.: +49 (351) 463-35982 E-Mail: alexander.gr...@tu-dresden.de ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
smime.p7s
Description: S/MIME Cryptographic Signature