> > [root@n6 /]# si > > PARTITION NODES NODES(A/I/O/T) S:C:T MEMORY TMP_DISK > TIMELIMIT AVAIL_FEATURES NODELIST > > debug* 6 0/6/0/6 1:4:2 7785 113264 > infinite (null) c[1-6] > > (for a moment) > > [root@n6 /]# si > > PARTITION NODES NODES(A/I/O/T) S:C:T MEMORY TMP_DISK > TIMELIMIT AVAIL_FEATURES NODELIST > > debug* 6 0/0/6/6 1:4:2 7785 113264 > infinite (null) c[1-6] > >
0/0/6/6 means your nodes are dying. You need to look into the /var/log/slurm/slurmd.log (*or where ever you put the slurmd logs on the machine, as dictated by SlurmdLogFile= ) on each of the nodes. I would predict that there is something wrong with your cgroup.conf try: - confirming that /etc/slurm/cgroup directory exists on all nodes (as per your cgroup.conf) - commenting out everything in cgroup.conf except CgroupAutomount=yes ConstrainCores=yes Cheers L. ------ "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consciousness that our institutions have failed and our ecosystem is collapsing, yet we are still here — and we are creative agents who can shape our destinies. Apocalyptic civics is the conviction that the only way out is through, and the only way through is together. " *Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857
