Aha! That is probably the issue slurmd ! I know slurmd runs on the compute nodes, I need to deploy this for a lab but I only have one of the servers with me. I will be adding them 1 by 1 after the first one is set up, to not disrupt their current setup. I want to be able to use the resources from the head and also the compute nodes once it's completed.
[stsadmin@head ~]$ sudo systemctl status slurmd Unit slurmd.service could not be found. [stsadmin@head ~]$ scontrol show node head NodeName=head CoresPerSocket=6 CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=head NodeHostName=head RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1 State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=lab BootTime=None SlurmdStartTime=None LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None CfgTRES=cpu=24,mem=184000M,billing=24 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a Reason=Not responding [slurm@2024-04-09T10:14:10] [stsadmin@head ~]$ cat ~/Downloads/test.sh #!/bin/bash #SBATCH --job-name=test_slurm #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --time=01:00:00 #SBATCH --output=test_slurm_output.txt echo "Starting the SLURM test job on: $(date)" echo "Running on hostname: $(hostname)" echo "SLURM_JOB_ID: $SLURM_JOB_ID" echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST" echo "SLURM_NTASKS: $SLURM_NTASKS" # Here you can place the commands you want to run on the compute node # For example, a simple sleep command or any application that needs to be tested sleep 60 echo "SLURM test job completed on: $(date)" On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote: > Alison > > > > The sinfo shows that your head node is down due to come configuration > error. > > > > Are you running slurmd on the head node? If slurmd, is running find the > log file for it and pass along the entries from it. > > > > Can you redo the scontrol command and “node name” should be “nodename” one > word. > > > > I need to see what’s in the test.sh file to get an idea of how your job is > setup. > > > > jeff > > > > *From:* Alison Peterson <apeters...@sdsu.edu> > *Sent:* Tuesday, April 9, 2024 3:15 PM > *To:* Jeffrey R. Lang <jrl...@uwyo.edu> > *Cc:* slurm-users@lists.schedmd.com > *Subject:* Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job > are down, drained or reserved > > > > Yes! here is the information: > > > > [stsadmin@head ~]$ sinfo > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > lab* up infinite 1 down* head > > > [stsadmin@head ~]$ scontrol show node name=head > Node name=head not found > > > [stsadmin@head ~]$ sbatch ~/Downloads/test.sh > Submitted batch job 7 > > > [stsadmin@head ~]$ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 7 lab test_slu stsadmin PD 0:00 1 > (ReqNodeNotAvail, UnavailableNodes:head) > > > > On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote: > > Alison > > > > Can you provide the output of the following commands: > > > > · sinfo > > · scontrol show node name=head > > > > and the job command that your trying to run? > > > > > > > > *From:* Alison Peterson <apeters...@sdsu.edu> > *Sent:* Tuesday, April 9, 2024 3:03 PM > *To:* Jeffrey R. Lang <jrl...@uwyo.edu> > *Cc:* slurm-users@lists.schedmd.com > *Subject:* Re: [EXT] RE: [slurm-users] Nodes required for job are down, > drained or reserved > > > > Hi Jeffrey, > > I'm sorry I did add the head node in the compute nodes configuration, > this is the slurm.conf > > > > # COMPUTE NODES > NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 > ThreadsPerCore=2 State=UNKNOWN > PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP > OverSubscribe=Force > > > > > > On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote: > > Alison > > > > The error message indicates that there are no resources to execute jobs. > Since you haven’t defined any compute nodes you will get this error. > > > > I would suggest that you create at least one compute node. Once, you do > that this error should go away. > > > > Jeff > > > > *From:* Alison Peterson via slurm-users <slurm-users@lists.schedmd.com> > *Sent:* Tuesday, April 9, 2024 2:52 PM > *To:* slurm-users@lists.schedmd.com > *Subject:* [slurm-users] Nodes required for job are down, drained or > reserved > > > > ◆ This message was sent from a non-UWYO address. Please exercise caution > when clicking links or opening attachments from external sources. > > > > Hi everyone, I'm conducting some tests. I've just set up SLURM on the head > node and haven't added any compute nodes yet. I'm trying to test it to > ensure it's working, but I'm encountering an error: 'Nodes required for the > job are DOWN, DRAINED, or reserved for jobs in higher priority partitions. > > > > Any guidance will be appreciated thank you! > > > > -- > > *Alison Peterson* > > IT Research Support Analyst > *Information Technology* > > apeters...@sdsu.edu <mfar...@sdsu.edu> > > O: 619-594-3364 > > *San Diego State University | **SDSU.edu <http://sdsu.edu/>* > > 5500 Campanile Drive | San Diego, CA 92182-8080 > > > > > > > -- > > *Alison Peterson* > > IT Research Support Analyst > *Information Technology* > > apeters...@sdsu.edu <mfar...@sdsu.edu> > > O: 619-594-3364 > > *San Diego State University | **SDSU.edu <http://sdsu.edu/>* > > 5500 Campanile Drive | San Diego, CA 92182-8080 > > > > > > > -- > > *Alison Peterson* > > IT Research Support Analyst > *Information Technology* > > apeters...@sdsu.edu <mfar...@sdsu.edu> > > O: 619-594-3364 > > *San Diego State University | **SDSU.edu <http://sdsu.edu/>* > > 5500 Campanile Drive | San Diego, CA 92182-8080 > > > -- *Alison Peterson* IT Research Support Analyst *Information Technology* apeters...@sdsu.edu <mfar...@sdsu.edu> O: 619-594-3364 *San Diego State University | SDSU.edu <http://sdsu.edu/>* 5500 Campanile Drive | San Diego, CA 92182-8080
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com