[slurm-users] error no error

2025-02-12 Thread Ricardo Román-Brenes via slurm-users
Hello. Could someone enlighten me as to what this error message is? Feb 13 10:02:00 gpu1 slurmd[573705]: slurmd: error: slurm_msg_sendto: address:port=192.168.9.1:36698 msg_type=8001: No error -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-use

[slurm-users] jobs getting stuck in CG

2025-02-10 Thread Ricardo Román-Brenes via slurm-users
Hello everyone. I have a cluster composed of 16 nodes, with 4 of them having GPUs with no particular configuration to manage them. The filesystem is gluster, authentication via slapd/munge. My problem is that very frequently, let's say at least a job daily, gets stuck in CG. I have no idea why th