[slurm-dev] Re: An issue about slurm on CentOS 7.3

Ole Holm Nielsen Mon, 28 Aug 2017 00:36:51 -0700


On 08/25/2017 06:19 PM, Nicholas McCollum wrote:

I like your documentation but I would add a few things:


I highly recommend not having the slurmctld start automatically upon
reboot.  If for some reason the slurm spool directory isn't available
(on a shared folder) it will cause all the jobs to die across the
cluster.  I always like to triple check to make sure that the directory
is available before starting the slurmctld.

I also find it helpful, especially in instances like this, to run the
daemon in foreground mode.

# slurmctld -Dvvvv
# slurmd -Dvvvv

This will print out any errors directly on the terminal and you can see
right away while the daemon has crashed or failed to start.

Thanks for your nice comments. I added a section about manual daemonstartup to cover the scenario you describe:

https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#manual-startup-of-services

It's difficult to foresee every kind of problem which may occur, butit's good to have common scenarios in the documentation.

Our Slurm master server only has local storage, but I suppose that youneed shared remote storage for Slurm HA controllers?


/Ole

[slurm-dev] Re: An issue about slurm on CentOS 7.3

Reply via email to