Can you ssh into the node and check the actual availability of memory?
Maybe there is a zombie process (or a healthy one with a memory leak bug)
that's hogging all the memory?

On Thu, May 25, 2023 at 7:31 AM Roger Mason <rma...@mun.ca> wrote:

> Hello,
>
> Doug Meyer <dameye...@gmail.com> writes:
>
> > Could also review the node log in /varlog/slurm/ .  Often sinfo -lR will
> tell you the cause, fro example mem not matching the config.
> >
> REASON               USER         TIMESTAMP           STATE  NODELIST
> Low RealMemory       slurm(468)   2023-05-25T09:26:59 drain* node012
> Not responding       slurm(468)   2023-05-25T09:30:31 down*
> node[001-003,008]
>
> But, as I sail in my response to Ole, the memory in slurm.conf and in
> the 'show node' output match.
>
> Many thanks for the help.
>
> Roger
>
>

Reply via email to