I like your documentation but I would add a few things:

I highly recommend not having the slurmctld start automatically upon
reboot.  If for some reason the slurm spool directory isn't available
(on a shared folder) it will cause all the jobs to die across the
cluster.  I always like to triple check to make sure that the directory
is available before starting the slurmctld.

I also find it helpful, especially in instances like this, to run the
daemon in foreground mode.

# slurmctld -Dvvvv
# slurmd -Dvvvv

This will print out any errors directly on the terminal and you can see
right away while the daemon has crashed or failed to start.


-- 
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority

On Fri, 2017-08-25 at 06:08 -0600, Ole Holm Nielsen wrote:
> On 08/25/2017 01:37 PM, Huijun HJ1 Ni wrote:>           I installed 
> slurm on my cluster whose OS are CentOS7.3.
> > 
> >           After I completed the configuration, I found that it
> > would be 
> > hung while executing ‘systemctl start slurm’ on compute nodes(but
> > is ok 
> > on control node where slurmctld runs).
> > 
> >           But if I used the command ‘systemctl start slurmd’ on
> > compute 
> > nodes, that were ok.
> > 
> >           So is that a defeat for slurm or any problems in my 
> > configurations? Can you help me?
> > 
> >           Attachment is my configurations.
> 
> Please see my HowTo Wiki about Slurm on CentOS/RHEL 7:
> https://wiki.fysik.dtu.dk/niflheim/SLURM
> 
> Documentation about starting services:
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration
> 
> /Ole

Reply via email to