A quick test to see if it's a configuration error is to set config_overrides in your slurm.conf and see if the node then responds to scontrol update.
________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Brian Andrus <toomuc...@gmail.com> Sent: Thursday, May 25, 2023 10:54 AM To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Nodes stuck in drain state That output of slurmd -C is your answer. Slurmd only sees 6GB of memory and you are claiming it has 10GB. I would run some memtests, look at meminfo on the node, etc. Maybe even check that the type/size of memory in there is what you think it is. Brian Andrus On 5/25/2023 7:30 AM, Roger Mason wrote: > Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> writes: > >> 1. Is slurmd running on the node? > Yes. > >> 2. What's the output of "slurmd -C" on the node? > NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2 > ThreadsPerCore=1 RealMemory=6097 > >> 3. Define State=UP in slurm.conf in stead of UNKNOWN > Will do. > >> 4. Why have you configured TmpDisk=0? It should be the size of the >> /tmp filesystem. > I have not configured TmpDisk. This the entry in slurm.conf for that > node: > NodeName=node012 CPUs=4 Boards=1 SocketsPerBoard=2 CoresPerSocket=2 > ThreadsPerCore=1 RealMemory=10193 State=UNKNOWN > > But I do notice that slurmd -C now says there is less memory than > configured. > > Thanks again. > > Roger >