Hi Chris, Working on weekends - hey ?
when i do "slurmd -C” on one of my execute node, i get: eric@radonc01:~$ slurmd -C NodeName=radonc01 slurmd: Considering each NUMA node as a socket CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=64402 UpTime=2-17:35:12 Also, when i do “lscpu” i get: eric@radonc01:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 21 Model: 2 Model name: AMD Opteron(tm) Processor 6376 Stepping: 0 “” “” “” “” “” It seems as the commands give different result (?) - What do you think ? _____________________________________________________________________________________________________ Eric F. Alemany System Administrator for Research Division of Radiation & Cancer Biology Department of Radiation Oncology Stanford University School of Medicine Stanford, California 94305 Tel:1-650-498-7969<tel:1-650-498-7969> No Texting Fax:1-650-723-7382<tel:1-650-723-7382> On May 5, 2018, at 5:42 AM, Chris Samuel <ch...@csamuel.org<mailto:ch...@csamuel.org>> wrote: On Saturday, 5 May 2018 2:45:19 AM AEST Eric F. Alemany wrote: With Ray suggestion i have a error message for each nodes. Here i am giving you only one error message from a node. sacct: error: NodeNames=radonc01 CPUs=32 doesn't match Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CPUs The interesting thing is if you follow the Sockets*CoresPerSocket*ThreadsPerCore formula 2x8x2 = 32 however look above and it says (16) - Strange, no ? No, Slurm is right. CPUS != threads. You've got 16 CPU cores, each with 2 threads. So in this configuration you can schedule 16 tasks per node and each task can use 2 threads. What does "slurmd -C" say on that node? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC