Alison

  I’m glad I was able to help.  Good luck.

Jeff

From: Alison Peterson <apeters...@sdsu.edu>
Sent: Tuesday, April 9, 2024 4:09 PM
To: Jeffrey R. Lang <jrl...@uwyo.edu>
Cc: slurm-users@lists.schedmd.com
Subject: Re: [EXT] RE: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes 
required for job are down, drained or reserved

Thank you so much!!! I have installed slurmd on the head node. Started and 
enabled the service, restarted slurmctld. I sent 2 jobs and they are running!

[stsadmin@head ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
                10       lab test_slu stsadmin  R       0:01      1 head
                 9       lab test_slu stsadmin  R       0:09      1 head

On Tue, Apr 9, 2024 at 1:54 PM Jeffrey R. Lang 
<jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>> wrote:
Alison

  In your case since you are using head as both a slurm management node and a 
compute node you’ll need to setup slurmd on the head node.

Once the slurmd is running use “sinfo” to see what the status of the node is.  
Most likely down hopefully without an astrick.  If that’s the case then use

                scontrol update node=head state=resume

and then check the status again.  Hopwfully the node with show idle meaning 
that it’s should be ready to accept jobs.


Jeff

From: Alison Peterson <apeters...@sdsu.edu<mailto:apeters...@sdsu.edu>>
Sent: Tuesday, April 9, 2024 3:40 PM
To: Jeffrey R. Lang <jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>>
Cc: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: Re: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job 
are down, drained or reserved

Aha! That is probably the issue slurmd ! I know slurmd runs on the compute 
nodes, I need to deploy this for a lab but I only have one of the servers with 
me. I will be adding them 1 by 1 after the first one is set up, to not disrupt 
their current setup. I want to be able to use the resources from the head and 
also the compute nodes once it's completed.

[stsadmin@head ~]$ sudo systemctl status slurmd
Unit slurmd.service could not be found.

[stsadmin@head ~]$ scontrol show node head
NodeName=head CoresPerSocket=6
   CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=head NodeHostName=head
   RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1
   State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A 
MCS_label=N/A
   Partitions=lab
   BootTime=None SlurmdStartTime=None
   LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None
   CfgTRES=cpu=24,mem=184000M,billing=24
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a
   Reason=Not responding [slurm@2024-04-09T10:14:10]

[stsadmin@head ~]$ cat ~/Downloads/test.sh
#!/bin/bash
#SBATCH --job-name=test_slurm
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
#SBATCH --output=test_slurm_output.txt

echo "Starting the SLURM test job on: $(date)"
echo "Running on hostname: $(hostname)"
echo "SLURM_JOB_ID: $SLURM_JOB_ID"
echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST"
echo "SLURM_NTASKS: $SLURM_NTASKS"

# Here you can place the commands you want to run on the compute node
# For example, a simple sleep command or any application that needs to be tested
sleep 60

echo "SLURM test job completed on: $(date)"

On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang 
<jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>> wrote:
Alison

  The sinfo shows that your head node is down due to come configuration error.

  Are you running slurmd on the head node?  If slurmd, is running find the log 
file for it and pass along the entries from it.

Can you redo the scontrol command and “node name” should be “nodename” one word.

I need to see what’s in the test.sh file to get an idea of how your job is 
setup.

jeff

From: Alison Peterson <apeters...@sdsu.edu<mailto:apeters...@sdsu.edu>>
Sent: Tuesday, April 9, 2024 3:15 PM
To: Jeffrey R. Lang <jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>>
Cc: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, 
drained or reserved

Yes! here is the information:

[stsadmin@head ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
lab*         up   infinite      1  down* head

[stsadmin@head ~]$ scontrol show node name=head
Node name=head not found

[stsadmin@head ~]$ sbatch ~/Downloads/test.sh
Submitted batch job 7

[stsadmin@head ~]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES 
NODELIST(REASON)
                 7       lab test_slu stsadmin PD       0:00      1 
(ReqNodeNotAvail, UnavailableNodes:head)

On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang 
<jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>> wrote:
Alison

Can you provide the output of the following commands:


•         sinfo

•         scontrol show node name=head

and the job command that your trying to run?



From: Alison Peterson <apeters...@sdsu.edu<mailto:apeters...@sdsu.edu>>
Sent: Tuesday, April 9, 2024 3:03 PM
To: Jeffrey R. Lang <jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>>
Cc: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained 
or reserved

Hi Jeffrey,
 I'm sorry I did add the head node in the compute nodes configuration, this is 
the slurm.conf

# COMPUTE NODES
NodeName=head CPUs=24 RealMemory=184000 Sockets=2  CoresPerSocket=6 
ThreadsPerCore=2 State=UNKNOWN
PartitionName=lab  Nodes=ALL Default=YES MaxTime=INFINITE State=UP 
OverSubscribe=Force


On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang 
<jrl...@uwyo.edu<mailto:jrl...@uwyo.edu>> wrote:
Alison

The error message indicates that there are no resources to execute jobs.   
Since you haven’t defined any compute nodes you will get this error.

I would suggest that you create at least one compute node.  Once, you do that 
this error should go away.

Jeff

From: Alison Peterson via slurm-users 
<slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>>
Sent: Tuesday, April 9, 2024 2:52 PM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] Nodes required for job are down, drained or reserved

◆ This message was sent from a non-UWYO address. Please exercise caution when 
clicking links or opening attachments from external sources.

Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node 
and haven't added any compute nodes yet. I'm trying to test it to ensure it's 
working, but I'm encountering an error: 'Nodes required for the job are DOWN, 
DRAINED, or reserved for jobs in higher priority partitions.

Any guidance will be appreciated thank you!

--
Alison Peterson
IT Research Support Analyst
Information Technology
apeters...@sdsu.edu<mailto:mfar...@sdsu.edu>
O: 619-594-3364
San Diego State University | SDSU.edu<http://sdsu.edu/>
5500 Campanile Drive | San Diego, CA 92182-8080
[https://brand.sdsu.edu/_images/sdsu-monogram-email.png]



--
Alison Peterson
IT Research Support Analyst
Information Technology
apeters...@sdsu.edu<mailto:mfar...@sdsu.edu>
O: 619-594-3364
San Diego State University | SDSU.edu<http://sdsu.edu/>
5500 Campanile Drive | San Diego, CA 92182-8080
[https://brand.sdsu.edu/_images/sdsu-monogram-email.png]



--
Alison Peterson
IT Research Support Analyst
Information Technology
apeters...@sdsu.edu<mailto:mfar...@sdsu.edu>
O: 619-594-3364
San Diego State University | SDSU.edu<http://sdsu.edu/>
5500 Campanile Drive | San Diego, CA 92182-8080
[https://brand.sdsu.edu/_images/sdsu-monogram-email.png]



--
Alison Peterson
IT Research Support Analyst
Information Technology
apeters...@sdsu.edu<mailto:mfar...@sdsu.edu>
O: 619-594-3364
San Diego State University | SDSU.edu<http://sdsu.edu/>
5500 Campanile Drive | San Diego, CA 92182-8080
[https://brand.sdsu.edu/_images/sdsu-monogram-email.png]



--
Alison Peterson
IT Research Support Analyst
Information Technology
apeters...@sdsu.edu<mailto:mfar...@sdsu.edu>
O: 619-594-3364
San Diego State University | SDSU.edu<http://sdsu.edu/>
5500 Campanile Drive | San Diego, CA 92182-8080
[https://brand.sdsu.edu/_images/sdsu-monogram-email.png]

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to