Hi!
I did a fresh installation with the EPEL repo, and installing munge from it and it worked. To have the slurm user for munge was definitely a problem, but that is the set up we have on the CentOS 6. Now I've learnt my lesson for future installations, thanks to everyone! Now, I have a follow up question, if you don't mind. I am now trying to run slurm, and it crashes: [root@roos21 ~]# systemctl status slurm.service ● slurm.service - LSB: slurm daemon management Loaded: loaded (/etc/rc.d/init.d/slurm; bad; vendor preset: disabled) Active: failed (Result: protocol) since Tue 2020-06-02 11:45:33 CEST; 3min 33s ago Docs: man:systemd-sysv-generator(8) Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Starting LSB: slurm daemon management... Jun 02 11:45:33 roos21.organ.su.se slurm[18223]: starting slurmd: [ OK ] Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Can't open PID file /var/run/slurmctld.pid (yet?) after start: No such file or directory Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Failed to start LSB: slurm daemon management. Jun 02 11:45:33 roos21.organ.su.se systemd[1]: Unit slurm.service entered failed state. Jun 02 11:45:33 roos21.organ.su.se systemd[1]: slurm.service failed. The thing is that this is a computing node, not the master node, so slurmctld is not installed. Why do I get this error? Many thanks, and my apologies for this rather simple questions. I am a newbie on this. Best, Ferran ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Renata Maria Dart <ren...@slac.stanford.edu> Sent: Friday, May 29, 2020 6:33:58 PM To: ole.h.niel...@fysik.dtu.dk; Slurm User Community List Subject: Re: [slurm-users] Problem with permisions. CentOS 7.8 Hi, don't know if this might be your problem but I ran into an issue on centos 7.8 where /var/run/munge was not being created at boottime because I didn't have the munge user in the local password file. I have the munge user in AD and once the system is up I can start munge successfully, but AD wasn't available early enough during boot for the munge startup to see it. I added these lines to the munge systemctl file: PermissionsStartOnly=true ExecStartPre=-/usr/bin/mkdir -m 0755 -p /var/run/munge ExecStartPre=-/usr/bin/chown -R munge:munge /var/run/munge and my system now starts munge up fine during a reboot. Renata On Fri, 29 May 2020, Ole Holm Nielsen wrote: > Hi Ferran, > > When you have a CentOS 7 system with the EPEL repo enabled, and you have > installed the munge RPM from EPEL, then things should be working correctly. > > Since systemctl tells you that Munge service didn't start correctly, then it > seems to me that you have a problem in the general configuration of your > CentOS > 7 system. You should check /var/log/messages and "journalctl -xe" for munge > errors. It is really hard for other people to guess what may be wrong in your > system. > > My 2 cents worth: Maybe you could make a fresh CentOS 7.8 installation on a > test system and install the Munge service (and nothing else) according to > instructions in https://wiki.fysik.dtu.dk/niflheim/Slurm_installation. This > *really* has got to work! > > /Ole > > > On 29-05-2020 10:23, Ferran Planas Padros wrote: >> Hello everyone, >> >> >> Here it comes everything I've done. >> >> >> - About Ole's answer: >> >> Yes, we have slurm as the user to control munge. Following your comment, I >> have changed the ownership of the munge files and tried to start munge as >> munge user. However, it also failed. >> >> Also, I first installed munge from a repository. I've seen your suggestion of >> installing from EPEL. So I uninstalled and installed again. Same result >> >> - About SELinux: It is disables >> >> - The output of ps -ef | grep munge is: >> >> >> root534051530 10:18 pts/000:00:00 grep --color=auto *munge* >> >> >> - The outputs of munge -n is: >> >> >> Failed to access "/var/run/munge/munge.socket.2": No such file or directory >> >> >> - Same for unmunge >> >> >> - Output for sudo systemctl status --full munge >> >> >> *?*munge.service - MUNGE authentication service >> >> Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor >> preset: >> disabled) >> >> Active: *failed*(Result: exit-code) since Fri 2020-05-29 10:15:52 CEST; 4min >> 18s ago >> >> Docs: man:munged(8) >> >> Process: 5333 ExecStart=/usr/sbin/munged *(code=exited, status=1/FAILURE)* >> >> >> May 29 10:15:52 roos21.organ.su.se systemd[1]: Starting MUNGE authentication >> service... >> >> May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service: control >> process >> exited, code=exited status=1* >> >> May 29 10:15:52 roos21.organ.su.se systemd[1]: *Failed to start MUNGE >> authentication service.* >> >> May 29 10:15:52 roos21.organ.su.se systemd[1]: *Unit munge.service entered >> failed state.* >> >> May 29 10:15:52 roos21.organ.su.se systemd[1]: *munge.service failed.* >> >> >> - Regarding NTP, I get this message: >> >> >> Unable to talk to NTP daemon. Is it running? >> >> >> It is the same message I get in the nodes that DO work. All nodes are sync in >> time and date with the central node >> >> >> ------------------------------------------------------------------------ >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Ole >> Holm Nielsen <ole.h.niel...@fysik.dtu.dk> >> *Sent:* Friday, May 29, 2020 9:56:10 AM >> *To:* slurm-users@lists.schedmd.com >> *Subject:* Re: [slurm-users] Problem with permisions. CentOS 7.8 >> On 29-05-2020 08:46, Sudeep Narayan Banerjee wrote: >>> also check: >>> a) whether NTP has been setup and communicating with master node >>> b) iptables may be flushed (iptables -L) >>> c) SeLinux to disabled, to check : >>> getenforce >>> vim /etc/sysconfig/selinux >>> (change SELINUX=enforcing to SELINUX=disabled and save the file and reboot) >> >> There is no reason to disable SELinux for running the Munge service. >> It's a pretty bad idea to lower the security just for the sake of >> convenience! >> >> /Ole >> >> >>> On Fri, May 29, 2020 at 12:08 PM Sudeep Narayan Banerjee >>> <snbaner...@iitgn.ac.in <mailto:snbaner...@iitgn.ac.in>> wrote: >>> >>> I have not checked on the CentOS7.8 >>> a) if /var/run/munge folder does not exist then please double check >>> whether munge has been installed or not >>> b) user root or sudo user to do >>> ps -ef | grep munge >>> kill -9 <PID> //where PID is the Process ID for munge (if the >>> process is running at all); else >>> >>> which munged >>> /etc/init.d/munge start >>> >>> please let me know the the output of: >>> >>> |$ munge -n| >>> >>> |$ munge -n | unmunge| >>> >>> |$ sudo systemctl status --full munge >>> >>> | >>> >>> Thanks & Regards, >>> Sudeep Narayan Banerjee >>> System Analyst | Scientist B >>> Indian Institute of Technology Gandhinagar >>> Gujarat, INDIA >>> >>> >>> On Fri, May 29, 2020 at 11:55 AM Bjørn-Helge Mevik >>> <b.h.me...@usit.uio.no <mailto:b.h.me...@usit.uio.no>> wrote: >>> >>> Ferran Planas Padros <ferran.pad...@su.se >>> <mailto:ferran.pad...@su.se>> writes: >>> >>> > I run the command as slurm user, and the /var/log/munge >>> folder does belong to slurm. >>> >>> For security reasons, I strongly advise that you run munged as a >>> separate user, which is unprivileged and not used for anything >>> else. >>> >>> -- Regards, >>> Bjørn-Helge Mevik, dr. scient, >>> Department for Research Computing, University of Oslo >>> >