Both work. The only discrepancy is that the slurm controller output had these two lines:
UID: ??? (1000) GID: ??? (1000) Like the controller doesn't know the username for UID 1000. But it returned success 0 On Fri, Apr 17, 2020 at 2:00 PM Riebs, Andy <andy.ri...@hpe.com> wrote: > A couple of quick checks to see if the problem is munge: > > 1. On the problem node, try > $ echo foo | munge | unmunge > > 2. If (1) works, try this from the node running slurmctld to the > problem node > slurm-node$ echo foo | ssh node munge | unmunge > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *Dean Schulze > *Sent:* Friday, April 17, 2020 3:40 PM > *To:* Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject:* Re: [slurm-users] Munge decode failing on new node > > > > There is no ntp service running on any of my nodes, and all but this one > is working. I haven't heard that ntp is a requirement for slurm, just that > the time be synchronized across the cluster. And it is. > > > > On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy <mini...@gmail.com> wrote: > > I’d check ntp as your encoding time seems odd to me > > > > On Wed, 15 Apr 2020 at 19:59, Dean Schulze <dean.w.schu...@gmail.com> > wrote: > > I've installed two new nodes onto my slurm cluster. One node works, but > the other one complains about an invalid credential for munge. I've > verified that the munge.key is the same as on all other nodes with > > > sudo cksum /etc/munge/munge.key > > > > I recopied a munge.key from a node that works. I've verified that munge > uid and gid are the same on the nodes. The time is in sync on all nodes. > > > > Here is what is in the slurmd.log: > > > > error: Unable to register: Unable to contact slurm controller (connect > failure) > error: Munge decode failed: Invalid credential > ENCODED: Wed Dec 31 17:00:00 1969 > DECODED: Wed Dec 31 17:00:00 1969 > error: authentication: Invalid authentication credential > error: slurm_receive_msg_and_forward: Protocol authentication error > error: service_connection: slurm_receive_msg: Protocol authentication > error > error: Unable to register: Unable to contact slurm controller (connect > failure) > > > > I've checked in the munged.log and all it says is > > > > Invalid credential > > > > Thanks for your help > > -- > > -- > Carles Fenoy > >