Re: [slurm-users] [External] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)

Michael Robbert Thu, 23 Apr 2020 14:06:21 -0700

I’m pretty sure that you should only need to restart slurmd on the node that 
was reporting the problem. If it put the node into a drained state you may need 
to manually undrain it using scontrol.


 

Testing job performance is not the job of the scheduler it just schedules the 
jobs that you tell it to. You’ll need to run those tests yourself. 

 

Mike

 

From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Robert 
Kudyba <rkud...@fordham.edu>
Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com>
Date: Thursday, April 23, 2020 at 12:55
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] [External] slurmd: error: Node configuration differs 
from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)

 

CAUTION: This email originated from outside of the Colorado School of Mines 
organization. Do not click on links or open attachments unless you recognize 
the sender and know the content is safe.

 

 

 

On Thu, Apr 23, 2020 at 1:43 PM Michael Robbert <mrobb...@mines.edu> wrote:

It looks like you have hyper-threading turned on, but haven’t defined the 
ThreadsPerCore=2. You either need to turn off Hyper-threading in the BIOS or 
changed the definition of ThreadsPerCore in slurm.conf.

 

Nice find. node003 has hyper threading enabled but node001 and node002 do not:

[root@node001 ~]# dmidecode -t processor | grep -E '(Core Count|Thread Count)'
        Core Count: 12
        Thread Count: 12
        Core Count: 12
        Thread Count: 12

[root@node003 ~]# dmidecode -t processor | grep -E '(Core Count|Thread Count)'
        Core Count: 12
        Thread Count: 24
        Core Count: 12

I found a great mini script to disable hyperthreading without reboot. I did get 
the following warning but I don't think it's a big issue:

 WARNING, didn't collect load info for all cpus, balancing is broken

 

Do I have to restart slurmctl on the head node and/or slurmd on node003?

 

Side question, are there ways with Slurm to test if hyperthreading improves 
performance and job speed?

smime.p7s
Description: S/MIME cryptographic signature

Re: [slurm-users] [External] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)

Reply via email to