Can you ssh into the node and check the actual availability of memory? Maybe there is a zombie process (or a healthy one with a memory leak bug) that's hogging all the memory?
On Thu, May 25, 2023 at 7:31 AM Roger Mason <rma...@mun.ca> wrote: > Hello, > > Doug Meyer <dameye...@gmail.com> writes: > > > Could also review the node log in /varlog/slurm/ . Often sinfo -lR will > tell you the cause, fro example mem not matching the config. > > > REASON USER TIMESTAMP STATE NODELIST > Low RealMemory slurm(468) 2023-05-25T09:26:59 drain* node012 > Not responding slurm(468) 2023-05-25T09:30:31 down* > node[001-003,008] > > But, as I sail in my response to Ole, the memory in slurm.conf and in > the 'show node' output match. > > Many thanks for the help. > > Roger > >