Hi Steffen, We are using Slurm on Debian Stretch at SURFsara on our LISA cluster. We've been using the Debian Slurm ( https://salsa.debian.org/hpc-team/slurm-wlm) with a couple of patches, although we're looking into a different option now.
Anyway, the daemons probably won't start because they're looking for the PID files in the wrong locations. Take a look at SlurmctldPidFile and SlurmdPidFile in slurm.conf and see if they match the systemd service files. I've not seen "scontrol reconfig" killing slurmctld, so I can't help you there. Did you put the verbosity on a debug level and see if it says anything before being killed? The munge part I don't know from memory how we manage that. Regards, Martijn On Wed, 2020-03-04 at 08:54 +0100, Steffen Grunewald wrote: > Good morning, > > is there anyone out there, running Slurm on a Debian Stretch > platform? > I've been maintaining a HTCondor pool for quite some time, and > recently > started an attempt to convert some of the compute nodes to form a > Slurm > cluster instead. > > I ran into some issues I could only partially resolve yet: > - magic UID for "slurm" user, but none for "munge" (and since the > munge > key has to be shared, unique UIDs are essential) > - daemons don't start (timeout) when using "service ... start", > running > with -D and backgrounding doesn't show anything weird > - "scontrol reconfig" tends to kill the slurmctld > > Upgrading to Buster isn't an option yet, and I doubt the issues would > vaporize by upgrading. > > Any suggestions? > > Thanks, > - S >
smime.p7s
Description: S/MIME cryptographic signature