I have a number of nodes that have, after our transition to Centos 7.3/SLURM 
17.02, begun to occasionally display a status of "Not responding". The health 
check we run on each node every 5 minutes detects nothing, and the nodes are 
perfectly healthy once I set their state to "idle". The slurmd continues 
uninterrupted, and the nodes get jobs immediately after going back online.

Has anyone on this list seen similar behavior? I have increased logging to 
debug/verbose, but have seen no errors worth noting.



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to