Aha! That is probably the issue slurmd ! I know slurmd runs on the compute
nodes, I need to deploy this for a lab but I only have one of the servers
with me. I will be adding them 1 by 1 after the first one is set up, to not
disrupt their current setup. I want to be able to use the resources from
the head and also the compute nodes once it's completed.

[stsadmin@head ~]$ sudo systemctl status slurmd
Unit slurmd.service could not be found.

[stsadmin@head ~]$ scontrol show node head
NodeName=head CoresPerSocket=6
   CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=head NodeHostName=head
   RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1
   State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A
MCS_label=N/A
   Partitions=lab
   BootTime=None SlurmdStartTime=None
   LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None
   CfgTRES=cpu=24,mem=184000M,billing=24
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a
   Reason=Not responding [slurm@2024-04-09T10:14:10]

[stsadmin@head ~]$ cat ~/Downloads/test.sh
#!/bin/bash
#SBATCH --job-name=test_slurm
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
#SBATCH --output=test_slurm_output.txt

echo "Starting the SLURM test job on: $(date)"
echo "Running on hostname: $(hostname)"
echo "SLURM_JOB_ID: $SLURM_JOB_ID"
echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST"
echo "SLURM_NTASKS: $SLURM_NTASKS"

# Here you can place the commands you want to run on the compute node
# For example, a simple sleep command or any application that needs to be
tested
sleep 60

echo "SLURM test job completed on: $(date)"

On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote:

> Alison
>
>
>
>   The sinfo shows that your head node is down due to come configuration
> error.
>
>
>
>   Are you running slurmd on the head node?  If slurmd, is running find the
> log file for it and pass along the entries from it.
>
>
>
> Can you redo the scontrol command and “node name” should be “nodename” one
> word.
>
>
>
> I need to see what’s in the test.sh file to get an idea of how your job is
> setup.
>
>
>
> jeff
>
>
>
> *From:* Alison Peterson <apeters...@sdsu.edu>
> *Sent:* Tuesday, April 9, 2024 3:15 PM
> *To:* Jeffrey R. Lang <jrl...@uwyo.edu>
> *Cc:* slurm-users@lists.schedmd.com
> *Subject:* Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job
> are down, drained or reserved
>
>
>
> Yes! here is the information:
>
>
>
> [stsadmin@head ~]$ sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> lab*         up   infinite      1  down* head
>
>
> [stsadmin@head ~]$ scontrol show node name=head
> Node name=head not found
>
>
> [stsadmin@head ~]$ sbatch ~/Downloads/test.sh
> Submitted batch job 7
>
>
> [stsadmin@head ~]$ squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>                  7       lab test_slu stsadmin PD       0:00      1
> (ReqNodeNotAvail, UnavailableNodes:head)
>
>
>
> On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote:
>
> Alison
>
>
>
> Can you provide the output of the following commands:
>
>
>
> ·         sinfo
>
> ·         scontrol show node name=head
>
>
>
> and the job command that your trying to run?
>
>
>
>
>
>
>
> *From:* Alison Peterson <apeters...@sdsu.edu>
> *Sent:* Tuesday, April 9, 2024 3:03 PM
> *To:* Jeffrey R. Lang <jrl...@uwyo.edu>
> *Cc:* slurm-users@lists.schedmd.com
> *Subject:* Re: [EXT] RE: [slurm-users] Nodes required for job are down,
> drained or reserved
>
>
>
> Hi Jeffrey,
>
>  I'm sorry I did add the head node in the compute nodes configuration,
> this is the slurm.conf
>
>
>
> # COMPUTE NODES
> NodeName=head CPUs=24 RealMemory=184000 Sockets=2  CoresPerSocket=6
> ThreadsPerCore=2 State=UNKNOWN
> PartitionName=lab  Nodes=ALL Default=YES MaxTime=INFINITE State=UP
> OverSubscribe=Force
>
>
>
>
>
> On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote:
>
> Alison
>
>
>
> The error message indicates that there are no resources to execute jobs.
> Since you haven’t defined any compute nodes you will get this error.
>
>
>
> I would suggest that you create at least one compute node.  Once, you do
> that this error should go away.
>
>
>
> Jeff
>
>
>
> *From:* Alison Peterson via slurm-users <slurm-users@lists.schedmd.com>
> *Sent:* Tuesday, April 9, 2024 2:52 PM
> *To:* slurm-users@lists.schedmd.com
> *Subject:* [slurm-users] Nodes required for job are down, drained or
> reserved
>
>
>
> ◆ This message was sent from a non-UWYO address. Please exercise caution
> when clicking links or opening attachments from external sources.
>
>
>
> Hi everyone, I'm conducting some tests. I've just set up SLURM on the head
> node and haven't added any compute nodes yet. I'm trying to test it to
> ensure it's working, but I'm encountering an error: 'Nodes required for the
> job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
>
>
>
> Any guidance will be appreciated thank you!
>
>
>
> --
>
> *Alison Peterson*
>
> IT Research Support Analyst
> *Information Technology*
>
> apeters...@sdsu.edu <mfar...@sdsu.edu>
>
> O: 619-594-3364
>
> *San Diego State University | **SDSU.edu <http://sdsu.edu/>*
>
> 5500 Campanile Drive | San Diego, CA 92182-8080
>
>
>
>
>
>
> --
>
> *Alison Peterson*
>
> IT Research Support Analyst
> *Information Technology*
>
> apeters...@sdsu.edu <mfar...@sdsu.edu>
>
> O: 619-594-3364
>
> *San Diego State University | **SDSU.edu <http://sdsu.edu/>*
>
> 5500 Campanile Drive | San Diego, CA 92182-8080
>
>
>
>
>
>
> --
>
> *Alison Peterson*
>
> IT Research Support Analyst
> *Information Technology*
>
> apeters...@sdsu.edu <mfar...@sdsu.edu>
>
> O: 619-594-3364
>
> *San Diego State University | **SDSU.edu <http://sdsu.edu/>*
>
> 5500 Campanile Drive | San Diego, CA 92182-8080
>
>
>


-- 
*Alison Peterson*
IT Research Support Analyst
*Information Technology*
apeters...@sdsu.edu <mfar...@sdsu.edu>
O: 619-594-3364
*San Diego State University | SDSU.edu <http://sdsu.edu/>*
5500 Campanile Drive | San Diego, CA 92182-8080
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to