A few thoughts…

1) I am not sure Slurm can run “all-in-one” with controller/worker/acctg-db all 
on one host… If anyone else know if this is doable, please chime in (I actually 
have a request to do this for a single machine at work, where the researchers 
want to have many folks share a single GPU compute server by running a job 
scheduler, and submitting their jobs to it…)

2) Maybe throw your bash script on a web site 
(pastebin.com<http://pastebin.com>, Github gist, etc.) so we can take a look at 
what you are doing.

3) I am not sure how you are starting slurmctld / slurmd services the first 
time, but do you know if you are running them via systemd? (the Ubuntu service 
manager in 16.04/18.04) If so, what does 'systemctl status [slurmctld|slurmd]' 
output?

Let’s start with that.

HTH,
Will

On May 5, 2018, at 2:44 PM, Kenneth Russell 
<linux-...@comcast.net<mailto:linux-...@comcast.net>> wrote:

I am a new slurm user and am trying to set up a single node test system. I have 
spent endless hours trying to get slurm services to start. I am running Ubuntu 
Server V16.04 and slurm 17.11.5. My MB has an AMD 8 core processor. When I try 
to start slurmdbd or slurmctld services I get messages saying can't access 
shared libraries or pid files missing. At times, I noticed that the pid files 
in /var/run have been deleted. I have made copies of the pid files and copy 
them back to /var/run when they are missing.

I have found that if I reinstall slurm from the tarball, the services will 
start. To speed things up, I have created a bash script to reinstall slurm, 
starting with the tarball extraction step. This is a very inefficient 
work-around.

Can anyone help me solve the problem of why slurm runs only once and then fails 
on subsequent starts?

I can send copies of conf and log files if requested.

Thanks, in advance.

Ken Russell


Reply via email to