Dear Loris: Many thanks for your response.
I did change the IDLE state to UNKNOWN state for NodeName configuration,
then reloaded *slurmctld* and got 2 gpu nodes(gpu3 & 4) as drain mode.
Although the same state I have manually updated to IDLE state.
But how do I change the CoresPerSocket and ThreadsPerCore in the
NodeName parameter?
Thanks & Regards,
Sudeep Narayan Banerjee
On 18/05/20 7:29 pm, Loris Bennett wrote:
Hi Sudeep,
I am not sure if this is the cause of the problem but in your slurm.conf
you have
# COMPUTE NODES
NodeName=node[1-10] Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 Procs=16
RealMemory=60000 State=IDLE
NodeName=gpu[1-2] CPUs=16 Gres=gpu:2 State=IDLE
NodeName=node[11-22] Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 Procs=32
State=IDLE
NodeName=node[23-24] Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 Procs=40
State=IDLE
NodeName=gpu[3-4] CPUs=32 Gres=gpu:1 State=IDLE
But if you read
man slurm.conf
you will find the following under the description of the parameter
"State" for nodes:
"IDLE" should not be specified in the node configuration, but set the
node state to "UNKNOWN" instead.
Cheers,
Loris
Sudeep Narayan Banerjee <snbaner...@iitgn.ac.in> writes:
Dear Loris: I am very sorry to address as Support; actually it has
become a bad habit for me which I will change. Sincere Apologies!
Yes, I have checked while adding hybrid arch of hardware but while
executing slurmctld, it shows mismatch in core-count and also the
existing 32core nodes goes to Dowm/Drng mode and new 40-core nodes
sets to IDLE.
Any help/guide to some link will be highly appreciated!
Thanks & Regards,
Sudeep Narayan Banerjee
System Analyst | Scientist B
Information System Technology Facility
Academic Block 5 | Room 110
Indian Institute of Technology Gandhinagar
Palaj, Gujarat 382355 INDIA
On 18/05/20 6:30 pm, Loris Bennett wrote:
Dear Sudeep,
Sudeep Narayan Banerjee <snbaner...@iitgn.ac.in> writes:
Dear Support,
This mailing list is not really the Slurm support list. It is just the
Slurm User Community List, so basically a bunch of people just like you.
node11-22 is having 16cores socket x 2 and node23-24 is having 20cores
socket x 2. In slurm.conf file (attached), can we merge all the nodes
11-24 (having different core count) and have a single queue or
partition name?
Yes, you can have a partition consisting of heterogeneous nodes. Have
you tried this? Was there a problem?
Cheers,
Loris