Greetings, I've got Slurm 17.11.7 running on a Scientific Linux 6. Things are working great.
I have a Scientific Linux 7 system that I just want to be able to run sinfo/squeue/sacct on. I installed 17.11.7 from the OpenHPC repo (it's what we have running on the other SL7 cluster). The munge.key and the slurm.conf file are the exact same as on the rest of the system. I can communicate easily between the slurmd/slurmdbd cluster host and the new system. I can run 'munge -n | ssh <host> unmunge' in both directions and have it all work. The munge service is running and just to be sure I've restarted it several times. I've disabled firewalls and SELinux is in permissive mode (just in case, it will go back on after I figure it out). When I run sinfo I get the following output: sinfo: error: Couldn't find the specified plugin name for auth/munge looking at all files sinfo: error: cannot find auth plugin for auth/munge sinfo: error: cannot create auth context for auth/munge sinfo: error: Couldn't find the specified plugin name for auth/munge looking at all files sinfo: error: cannot find auth plugin for auth/munge sinfo: error: cannot create auth context for auth/munge sinfo: error: Couldn't find the specified plugin name for auth/munge looking at all files sinfo: error: cannot find auth plugin for auth/munge sinfo: error: cannot create auth context for auth/munge sinfo: error: authentication: authentication initialization failure slurm_load_partitions: Protocol authentication error When I look at the slurmctld.log file I see this error: error: slurm_receive_msg [IP:49916]: Zero Bytes were transmitted or received I do have the slurm-munge package installed on the client (as well as most all of the other slurm packages too). I suspect it is something with the OHPC rpms, but I'm not sure. Any thoughts on how to fix or should I just rebuild from source? Thanks! ~Stack~
signature.asc
Description: OpenPGP digital signature