Running Slurm 20.02 on Centos 7.7 on Bright Cluster 8.2. slurm.conf is on the head node. I don't see these errors on the other 2 nodes. After restarting slurmd on node003 I see this:
slurmd[400766]: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw) CoresPerSocket=12:12(hw) ThreadsPerCore=1:2(hw) Apr 23 10:05:49 node003 slurmd[400766]: Message aggregation disabled Apr 23 10:05:49 node003 slurmd[400766]: CPU frequency setting not configured for this node Apr 23 10:05:49 node003 slurmd[400770]: CPUs=24 Boards=1 Sockets=2 Cores=12 Threads=1 Memory=191880 TmpDisk=2038 Uptime=2488268 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) >From slurm.conf: # Nodes NodeName=node[001-003] CoresPerSocket=12 RealMemory=191800 Sockets=2 Gres=gpu:v100:1 # Partitions $O Hidden=NO OverSubscribe=FORCE:12 GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=N$ PartitionName=gpuq Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidde$ # Generic resources types GresTypes=gpu,mic SelectType=select/cons_tres SelectTypeParameters=CR_CPU SchedulerTimeSlice=60 EnforcePartLimits=YES lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Stepping: 4 CPU MHz: 2600.000 BogoMIPS: 5200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 19712K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47 cat /etc/slurm/cgroup.conf| grep -v '#' CgroupMountpoint="/sys/fs/cgroup" CgroupAutomount=no AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf" TaskAffinity=no ConstrainCores=no ConstrainRAMSpace=no ConstrainSwapSpace=no ConstrainDevices=no ConstrainKmemSpace=yes AllowedRamSpace=100 AllowedSwapSpace=0 MinKmemSpace=30 MaxKmemPercent=100 MaxRAMPercent=100 MaxSwapPercent=100 MinRAMSpace=30 What else can I check?