Run 'sinfo -R' to see the reason any nodes may be down.
It may be as simple as running 'scontrol update state=resume
nodename=xxxx' to bring them back, if they are down. It depends on the
reason they went down (if that is the issue).
Otherwise, check the job requirements to see what it is asking for that
does not exist 'scontrol show job xxx'
Brian Andrus
On 1/4/2025 3:41 AM, John Hearns via slurm-users wrote:
Output of sinfo and squeue
Look at slurmd log in an example node also
Tail -f is your friend
On Sat, Jan 4, 2025, 8:13 AM sportlecon sportlecon via slurm-users
<slurm-users@lists.schedmd.com> wrote:
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
26 cpu myscript user1 PD 0:00
4 (Nodes required for job are DOWN, DRAINED or reserved
for jobs in higher priority partitions)
Anyone can help to fix this?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com