That output of slurmd -C is your answer.
Slurmd only sees 6GB of memory and you are claiming it has 10GB.
I would run some memtests, look at meminfo on the node, etc.
Maybe even check that the type/size of memory in there is what you think
it is.
Brian Andrus
On 5/25/2023 7:30 AM, Roger Mason wrote:
Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes:
1. Is slurmd running on the node?
Yes.
2. What's the output of "slurmd -C" on the node?
NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2
ThreadsPerCore=1 RealMemory=6097
3. Define State=UP in slurm.conf in stead of UNKNOWN
Will do.
4. Why have you configured TmpDisk=0? It should be the size of the
/tmp filesystem.
I have not configured TmpDisk. This the entry in slurm.conf for that
node:
NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2
ThreadsPerCore=1 RealMemory=10193 State=UNKNOWN
But I do notice that slurmd -C now says there is less memory than
configured.
Thanks again.
Roger