I like your documentation but I would add a few things: I highly recommend not having the slurmctld start automatically upon reboot. If for some reason the slurm spool directory isn't available (on a shared folder) it will cause all the jobs to die across the cluster. I always like to triple check to make sure that the directory is available before starting the slurmctld.
I also find it helpful, especially in instances like this, to run the daemon in foreground mode. # slurmctld -Dvvvv # slurmd -Dvvvv This will print out any errors directly on the terminal and you can see right away while the daemon has crashed or failed to start. -- Nicholas McCollum HPC Systems Administrator Alabama Supercomputer Authority On Fri, 2017-08-25 at 06:08 -0600, Ole Holm Nielsen wrote: > On 08/25/2017 01:37 PM, Huijun HJ1 Ni wrote:> I installed > slurm on my cluster whose OS are CentOS7.3. > > > > After I completed the configuration, I found that it > > would be > > hung while executing ‘systemctl start slurm’ on compute nodes(but > > is ok > > on control node where slurmctld runs). > > > > But if I used the command ‘systemctl start slurmd’ on > > compute > > nodes, that were ok. > > > > So is that a defeat for slurm or any problems in my > > configurations? Can you help me? > > > > Attachment is my configurations. > > Please see my HowTo Wiki about Slurm on CentOS/RHEL 7: > https://wiki.fysik.dtu.dk/niflheim/SLURM > > Documentation about starting services: > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration > > /Ole
