[slurm-users] Re: launch failed requeued held

2025-01-08 Thread John Hearns via slurm-users
Generally, the troubleshooting steps which you should take for Slurm are: squeue to look at the list of running/queued or held jobs sinfo to show which nodes are idle, busy or down scontrol show node to get more detailed information on a node For problem nodes - indeed just log into any node t

[slurm-users] Re: launch failed requeued held

2025-01-07 Thread John Hearns via slurm-users
You need to find the node which the job started on. Then look at the slurmd log on that node. You may find an indication of the reason for the failure. On Tue, 7 Jan 2025 at 11:30, sportlecon sportlecon via slurm-users < slurm-users@lists.schedmd.com> wrote: > slurm 24.11 - squeue displays reaso