Hello,

I'm trying to setup NHC[0] for our Slurm cluster, but I'm not getting
it to work properly.

I'm using the dev branch from [0] and compiled it this way:

$ ./autogen.sh --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
$ make test
$ sudo make install

When I run nhc, I get an error that sshd is not running:

$ sudo nhc
ERROR:  nhc:  Health check failed:  check_ps_service:  Service sshd (process 
sshd) owned by root not running

I know sshd is running because I logged in this machine with ssh. And
`systemctl status sshd` shows it is active.

Here's a sample of my nhc.conf:

   * || check_ps_service munged
   * || check_ps_service -u root sshd
   * || check_ps_service -u root ssh
   * || check_ps_service ssh
   * || check_ps_service sshd

If I run `sudo nhc -a` to run all the tests, it gives 4 errors about
ssh.

NHC can find munge running, so what's the problem with ssh? What am I
missing?

I'm using Ubuntu 20.04.

Cheers,
Heitor


[0] https://github.com/mej/nhc/

Attachment: pgp5KTuJw6y_H.pgp
Description: OpenPGP digital signature

Reply via email to