Hi Chris, re: "can't run more than 1 job per node at a time. "
try "scontrol show config" and grep for defmem IIRC by default the memory request for any job is all the memory in a node. Regards, Alex On Thu, Apr 4, 2019 at 4:01 PM Andy Riebs <andy.ri...@hpe.com> wrote: > in slurm.conf, on the line(s) starting "NodeName=", you'll want to add > specs for sockets, cores, and threads/core. > > ------------------------------ > *From:* Chris Bateson <cbate...@vt.edu> <cbate...@vt.edu> > *Sent:* Thursday, April 04, 2019 5:18PM > *To:* Slurm-users <slurm-users@lists.schedmd.com> > <slurm-users@lists.schedmd.com> > *Cc:* > *Subject:* [slurm-users] Slurm 1 CPU > I should start out by saying that I am extremely new to anything HPC. Our > end users purchased a 20 node cluster which a vendor set up for us with > Bright/Slurm. > > After our vendor said everything was complete and we started migrating our > users workflow to the new cluster they discovered that they can't run more > than 1 job per node at a time. We started researching enabling consumable > resources which I believe we've done so however we're getting the same > result. > > I've just discovered today that both *scontrol show node* and *sinfo -lNe* > show that each of our nodes have 1 CPU. I'm guessing that's why we can't > submit more than 1 job at a time. I'm trying to determine where is it > getting this information and how can I get it to display the correct CPU > information. > > Sample info: > > *scontrol show node* > > NodeName=cnode001 Arch=x86_64 CoresPerSocket=1 > CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.01 > AvailableFeatures=(null) > ActiveFeatures=(null) > Gres=(null) > NodeAddr=cnode001 NodeHostName=cnode001 Version=17.11 > OS=Linux 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 > RealMemory=192080 AllocMem=0 FreeMem=188798 Sockets=1 Boards=1 > State=IDLE ThreadsPerCore=1 TmpDisk=2038 Weight=1 Owner=N/A > MCS_label=N/A > Partitions=defq > BootTime=2019-03-26T14:28:24 SlurmdStartTime=2019-03-26T14:29:55 > CfgTRES=cpu=1,mem=192080M,billing=1 > AllocTRES= > CapWatts=n/a > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > *sinfo -lNe* > > NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK > WEIGHT AVAIL_FE REASON > cnode001 1 defq* idle 1 1:1:1 192080 2038 > 1 (null) none > > > *lscpu* > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 48 > On-line CPU(s) list: 0-47 > Thread(s) per core: 1 > Core(s) per socket: 24 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz > Stepping: 4 > CPU MHz: 2700.000 > BogoMIPS: 5400.00 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 33792K > NUMA node0 CPU(s): > 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46 > NUMA node1 CPU(s): > 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr > pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts > rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq > dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c > rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_pt tpr_shadow vnmi > flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms > invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb > avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc > cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts > > > *slrum.conf SelectType Configuration* > > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory > PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL > PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO > Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL > AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=YES OverTimeLimit=0 > State=UP Nodes=cnode[001-020] > > > > I can provide other configs if you feel that it could help. > > Any ideas? I would have thought that slurm would grab the CPU information > from the CPU instead of the configuration. > > Thanks > Chris > > > >