Hi, Currently, some of our nodes are overloaded. The nhc installed used to check the load and drain the node when it is overloaded. However, for the past few days, it is not showing the state of the node. When I run /usr/sbin/nhc manually, it says 20230130 21:25:14 [slurm] /usr/libexec/nhc/node-mark-online mcn26.chicagobooth.edu /usr/libexec/nhc/node-mark-online: Not sure how to handle node state "" on mcn26.chicagobooth.edu /usr/libexec/nhc/node-mark-online: Skipping node mcn26.chicagobooth.edu ( )
It seems that it is not able to read the state of the node. I ran scontrol show node mcn26 NodeName=mcn26 Arch=x86_64 CoresPerSocket=16 NodeAddr=mcn26 NodeHostName=mcn26 Version=20.11.8 Any idea what happened and why nhc is not reading the state of the node anymore? Best, *Fritz Ratnasamy* Data Scientist Information Technology