ps -eaf --forest is your friend with Slurm

On Mon, Feb 10, 2025, 12:08 PM Michał Kadlof via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> I observed similar symptoms when we had issues with the shared Lustre file
> system. When the file system couldn't complete an I/O operation, the
> process in Slurm remained in the CG state until the file system became
> responsive again. An additional symptom was that the blocking process was
> stuck in the D state.
> On 10/02/2025 09:28, Ricardo Román-Brenes via slurm-users wrote:
>
> Hello everyone.
>
> I have a cluster composed of 16 nodes, with 4 of them having GPUs with no
> particular configuration to manage them.
> The filesystem is gluster, authentication via slapd/munge.
>
> My problem is that very frequently, let's say at least a job daily, gets
> stuck in CG. I have no idea why this happens. Manually killing the
> slurmstep process releases the node but this is in no way a manageable
> solution. Has anyone experienced this (and fixed it?)
>
> Thank you.
>
> -Ricardo
>
> --
> best regards | pozdrawiam serdecznie
> *Michał Kadlof*
> Head of the high performance computing center Kierownik ośrodka
> obliczeniowego HPC
> EdenN cluster administrator Administrator klastra obliczeniowego EdenN
> Structural and Functional Genomics Laboratory Laboratorium Genomiki
> Strukturalnej i Funkcjonalnej
> Faculty of Mathematics and Computer Science Wydział Matematyki i Nauk
> Informacyjnych
> Warsaw University of Technology Politechnika Warszawska
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to