It looks like you have hyper-threading turned on, but haven’t defined the ThreadsPerCore=2. You either need to turn off Hyper-threading in the BIOS or changed the definition of ThreadsPerCore in slurm.conf.
Mike From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Robert Kudyba <rkud...@fordham.edu> Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com> Date: Thursday, April 23, 2020 at 08:27 To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: [External] [slurm-users] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw) CAUTION: This email originated from outside of the Colorado School of Mines organization. Do not click on links or open attachments unless you recognize the sender and know the content is safe. Running Slurm 20.02 on Centos 7.7 on Bright Cluster 8.2. slurm.conf is on the head node. I don't see these errors on the other 2 nodes. After restarting slurmd on node003 I see this: slurmd[400766]: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw) CoresPerSocket=12:12(hw) ThreadsPerCore=1:2(hw) Apr 23 10:05:49 node003 slurmd[400766]: Message aggregation disabled Apr 23 10:05:49 node003 slurmd[400766]: CPU frequency setting not configured for this node Apr 23 10:05:49 node003 slurmd[400770]: CPUs=24 Boards=1 Sockets=2 Cores=12 Threads=1 Memory=191880 TmpDisk=2038 Uptime=2488268 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) >From slurm.conf: # Nodes NodeName=node[001-003] CoresPerSocket=12 RealMemory=191800 Sockets=2 Gres=gpu:v100:1 # Partitions $O Hidden=NO OverSubscribe=FORCE:12 GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=N$ PartitionName=gpuq Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidde$ # Generic resources types GresTypes=gpu,mic SelectType=select/cons_tres SelectTypeParameters=CR_CPU SchedulerTimeSlice=60 EnforcePartLimits=YES lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Stepping: 4 CPU MHz: 2600.000 BogoMIPS: 5200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 19712K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47 cat /etc/slurm/cgroup.conf| grep -v '#' CgroupMountpoint="/sys/fs/cgroup" CgroupAutomount=no AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf" TaskAffinity=no ConstrainCores=no ConstrainRAMSpace=no ConstrainSwapSpace=no ConstrainDevices=no ConstrainKmemSpace=yes AllowedRamSpace=100 AllowedSwapSpace=0 MinKmemSpace=30 MaxKmemPercent=100 MaxRAMPercent=100 MaxSwapPercent=100 MinRAMSpace=30 What else can I check?
smime.p7s
Description: S/MIME cryptographic signature