Thank you so much!!! I have installed slurmd on the head node. Started and enabled the service, restarted slurmctld. I sent 2 jobs and they are running!
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 10 lab test_slu stsadmin R 0:01 1 head 9 lab test_slu stsadmin R 0:09 1 head On Tue, Apr 9, 2024 at 1:54 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote: > Alison > > > > In your case since you are using head as both a slurm management node > and a compute node you’ll need to setup slurmd on the head node. > > > > Once the slurmd is running use “sinfo” to see what the status of the node > is. Most likely down hopefully without an astrick. If that’s the case > then use > > > > scontrol update node=head state=resume > > > > and then check the status again. Hopwfully the node with show idle > meaning that it’s should be ready to accept jobs. > > > > > > Jeff > > > > *From:* Alison Peterson <apeters...@sdsu.edu> > *Sent:* Tuesday, April 9, 2024 3:40 PM > *To:* Jeffrey R. Lang <jrl...@uwyo.edu> > *Cc:* slurm-users@lists.schedmd.com > *Subject:* Re: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes required > for job are down, drained or reserved > > > > Aha! That is probably the issue slurmd ! I know slurmd runs on the compute > nodes, I need to deploy this for a lab but I only have one of the servers > with me. I will be adding them 1 by 1 after the first one is set up, to not > disrupt their current setup. I want to be able to use the resources from > the head and also the compute nodes once it's completed. > > > > [stsadmin@head ~]$ sudo systemctl status slurmd > Unit slurmd.service could not be found. > > > [stsadmin@head ~]$ scontrol show node head > NodeName=head CoresPerSocket=6 > CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00 > AvailableFeatures=(null) > ActiveFeatures=(null) > Gres=(null) > NodeAddr=head NodeHostName=head > RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1 > State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A > MCS_label=N/A > Partitions=lab > BootTime=None SlurmdStartTime=None > LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None > CfgTRES=cpu=24,mem=184000M,billing=24 > AllocTRES= > CapWatts=n/a > CurrentWatts=0 AveWatts=0 > ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a > Reason=Not responding [slurm@2024-04-09T10:14:10] > > [stsadmin@head ~]$ cat ~/Downloads/test.sh > #!/bin/bash > #SBATCH --job-name=test_slurm > #SBATCH --nodes=1 > #SBATCH --ntasks=1 > #SBATCH --time=01:00:00 > #SBATCH --output=test_slurm_output.txt > > echo "Starting the SLURM test job on: $(date)" > echo "Running on hostname: $(hostname)" > echo "SLURM_JOB_ID: $SLURM_JOB_ID" > echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST" > echo "SLURM_NTASKS: $SLURM_NTASKS" > > # Here you can place the commands you want to run on the compute node > # For example, a simple sleep command or any application that needs to be > tested > sleep 60 > > echo "SLURM test job completed on: $(date)" > > > > On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote: > > Alison > > > > The sinfo shows that your head node is down due to come configuration > error. > > > > Are you running slurmd on the head node? If slurmd, is running find the > log file for it and pass along the entries from it. > > > > Can you redo the scontrol command and “node name” should be “nodename” one > word. > > > > I need to see what’s in the test.sh file to get an idea of how your job is > setup. > > > > jeff > > > > *From:* Alison Peterson <apeters...@sdsu.edu> > *Sent:* Tuesday, April 9, 2024 3:15 PM > *To:* Jeffrey R. Lang <jrl...@uwyo.edu> > *Cc:* slurm-users@lists.schedmd.com > *Subject:* Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job > are down, drained or reserved > > > > Yes! here is the information: > > > > [stsadmin@head ~]$ sinfo > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > lab* up infinite 1 down* head > > > [stsadmin@head ~]$ scontrol show node name=head > Node name=head not found > > > [stsadmin@head ~]$ sbatch ~/Downloads/test.sh > Submitted batch job 7 > > > [stsadmin@head ~]$ squeue > JOBID PARTITION NAME USER ST TIME NODES > NODELIST(REASON) > 7 lab test_slu stsadmin PD 0:00 1 > (ReqNodeNotAvail, UnavailableNodes:head) > > > > On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote: > > Alison > > > > Can you provide the output of the following commands: > > > > · sinfo > > · scontrol show node name=head > > > > and the job command that your trying to run? > > > > > > > > *From:* Alison Peterson <apeters...@sdsu.edu> > *Sent:* Tuesday, April 9, 2024 3:03 PM > *To:* Jeffrey R. Lang <jrl...@uwyo.edu> > *Cc:* slurm-users@lists.schedmd.com > *Subject:* Re: [EXT] RE: [slurm-users] Nodes required for job are down, > drained or reserved > > > > Hi Jeffrey, > > I'm sorry I did add the head node in the compute nodes configuration, > this is the slurm.conf > > > > # COMPUTE NODES > NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 > ThreadsPerCore=2 State=UNKNOWN > PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP > OverSubscribe=Force > > > > > > On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang <jrl...@uwyo.edu> wrote: > > Alison > > > > The error message indicates that there are no resources to execute jobs. > Since you haven’t defined any compute nodes you will get this error. > > > > I would suggest that you create at least one compute node. Once, you do > that this error should go away. > > > > Jeff > > > > *From:* Alison Peterson via slurm-users <slurm-users@lists.schedmd.com> > *Sent:* Tuesday, April 9, 2024 2:52 PM > *To:* slurm-users@lists.schedmd.com > *Subject:* [slurm-users] Nodes required for job are down, drained or > reserved > > > > ◆ This message was sent from a non-UWYO address. Please exercise caution > when clicking links or opening attachments from external sources. > > > > Hi everyone, I'm conducting some tests. I've just set up SLURM on the head > node and haven't added any compute nodes yet. I'm trying to test it to > ensure it's working, but I'm encountering an error: 'Nodes required for the > job are DOWN, DRAINED, or reserved for jobs in higher priority partitions. > > > > Any guidance will be appreciated thank you! > > > > -- > > *Alison Peterson* > > IT Research Support Analyst > *Information Technology* > > apeters...@sdsu.edu <mfar...@sdsu.edu> > > O: 619-594-3364 > > *San Diego State University | **SDSU.edu <http://sdsu.edu/>* > > 5500 Campanile Drive | San Diego, CA 92182-8080 > > > > > > > -- > > *Alison Peterson* > > IT Research Support Analyst > *Information Technology* > > apeters...@sdsu.edu <mfar...@sdsu.edu> > > O: 619-594-3364 > > *San Diego State University | **SDSU.edu <http://sdsu.edu/>* > > 5500 Campanile Drive | San Diego, CA 92182-8080 > > > > > > > -- > > *Alison Peterson* > > IT Research Support Analyst > *Information Technology* > > apeters...@sdsu.edu <mfar...@sdsu.edu> > > O: 619-594-3364 > > *San Diego State University | **SDSU.edu <http://sdsu.edu/>* > > 5500 Campanile Drive | San Diego, CA 92182-8080 > > > > > > > -- > > *Alison Peterson* > > IT Research Support Analyst > *Information Technology* > > apeters...@sdsu.edu <mfar...@sdsu.edu> > > O: 619-594-3364 > > *San Diego State University | **SDSU.edu <http://sdsu.edu/>* > > 5500 Campanile Drive | San Diego, CA 92182-8080 > > > -- *Alison Peterson* IT Research Support Analyst *Information Technology* apeters...@sdsu.edu <mfar...@sdsu.edu> O: 619-594-3364 *San Diego State University | SDSU.edu <http://sdsu.edu/>* 5500 Campanile Drive | San Diego, CA 92182-8080
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com