Thank you for your response.

I have found found out why there was no error in the log: I've been
looking at the wrong log. The error didn't occur on the master, but on
our vpn-gateway (it is a hybrid cloud setup) - but you can thin of it as
just another worker in the same network. The error I get there is:

`
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 slurmctld[32014]:
slurmctld: fatal: auth/jwt: cannot stat '/etc/slurm/jwt-secret.key': No
such file or directory
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 systemd[1]:
slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 systemd[1]:
slurmctld.service: Failed with result 'exit-code'.
Feb 08 11:38:25 cluster-vpngtw-3ts770ji3a8ubr1-0 systemd[1]: Failed to
start Slurm controller daemon.
`

In the past we have created the `jwt-secret.key` on the master at
`etc/slurm` and that was enough, but I must admit that I am not
completely familiar with it, but I will now look into it closer and also
double check whether such a key is stored there in the old slurm version.

Best regards,
Xaver

On 08.02.24 11:07, Luke Sudbery via slurm-users wrote:
Your systemctl output shows that slurmctld is running OK, but that doesn't 
match with your first entry, so it's hard to tell what's going on.

But if slurmctld won't start under systemd but it's not clear why the first 
step would be to enable something like `SlurmctldDebug = debug` and check the 
full logs in journalctl or just run slurmctld in the forground with:

/usr/sbin/slurmctld -D -vvv

Make sure the system service is properly stopped and there aren't any rouge 
slurmctld processes anywhere.

Many thanks,

Luke


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to