Hello.
Could someone enlighten me as to what this error message is?
Feb 13 10:02:00 gpu1 slurmd[573705]: slurmd: error: slurm_msg_sendto:
address:port=192.168.9.1:36698 msg_type=8001: No error
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-use
Hello everyone.
I have a cluster composed of 16 nodes, with 4 of them having GPUs with no
particular configuration to manage them.
The filesystem is gluster, authentication via slapd/munge.
My problem is that very frequently, let's say at least a job daily, gets
stuck in CG. I have no idea why th