On 2/10/25 7:05 am, Michał Kadlof via slurm-users wrote:
I observed similar symptoms when we had issues with the shared Lustre
file system. When the file system couldn't complete an I/O operation,
the process in Slurm remained in the CG state until the file system
became responsive again. An a
ps -eaf --forest is your friend with Slurm
On Mon, Feb 10, 2025, 12:08 PM Michał Kadlof via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> I observed similar symptoms when we had issues with the shared Lustre file
> system. When the file system couldn't complete an I/O operation, the
> pro
I observed similar symptoms when we had issues with the shared Lustre
file system. When the file system couldn't complete an I/O operation,
the process in Slurm remained in the CG state until the file system
became responsive again. An additional symptom was that the blocking
process was stuck
Belay that reply. Different issue.
In that case salloc works OK but stun says user has no job on the node
On Mon, Feb 10, 2025, 9:24 AM John Hearns wrote:
> I have had something similar.
> The fix was to run a
> scontrol reconfig
> Which causes a reread of the Slurmd config
> Give that a try
>
>
I have had something similar.
The fix was to run a
scontrol reconfig
Which causes a reread of the Slurmd config
Give that a try
It might be scontrol reread. Use the manual
On Mon, Feb 10, 2025, 8:32 AM Ricardo Román-Brenes via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> Hello everyone.