On Mon, Nov 13, 2017 at 10:15 AM, Benjamin Redling < benjamin.ra...@uni-jena.de> wrote:
> On 11/12/17 4:52 PM, Gennaro Oliva wrote: > >> On Sun, Nov 12, 2017 at 10:03:18AM -0500, Will L wrote: >> > > I just tried `sudo apt-get remove --purge munge`, etc., and munge itself >>> >> > this should have uninstalled slurm-wlm also, did you reinstalled it with >> apt? >> > > seems to be working fine. But I still get `slurmctld: error: Couldn't find >>> the specified plugin name for crypto/munge looking at all files`. Is >>> there >>> >> > if you didn't reinstall slurm with apt you may be using the slurmctld >> executable from a failed source installation, and for some reason this >> can't find the corresponding plugin directory. >> > > I suggest to try to install the slurm-wlm package with: >> > > apt-get install slurm-wlm >> > I would *currently* avoid the Debian (Stretch) packages like the plague: > the last update tried to (re)start slurmctld which -- surprise, surprise > -- fails on every node that's not the master with an exit code that leaves > the packages unconfigured. > > That raises the question if anyone did bother to test them on a multi-node > cluster > > I'm still hoping I messed that up and not the maintainer. Maybe I expect > to much from an "apt upgrade" nowadays... > > Regards, > Benjamin > -- > FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html > ☎ +49 3641 9 44323 > > For debian stretch, the slurm daemon have been broken out into different packages. So the nodes only need to install the slurmd package. slurm-wlm installs everything and thus is only needed on the controller nodes. Or skip slurm-wlm all together and just install slurmctld on the controller nodes and slurmd on the compute nodes.