On 9/12/24 5:44 pm, Steven Jones via slurm-users wrote:
[2024-12-09T23:38:56.645] error: Munge decode failed: Rewound credential
[2024-12-09T23:38:56.645] auth/munge: _print_cred: ENCODED: Tue Dec 10
23:38:30 2024
[2024-12-09T23:38:56.645] auth/munge: _print_cred: DECODED: Mon Dec 09
23:38:56
Hi,
As suggested,
8><---
Stop their services, start them manually one by one (ctld first), then
watch whether they talk to each other, and if they don't, learn what stops
them from doing so - then iterate editing the config, "scontrol reconfig",
lather, rinse, repeat.
8><---
Error logs,
On node
Is the slurm version critical?
[root@node3 /]# sinfo -V
slurm 20.11.9
[root@node3 /]# uname -a
Linux node3.ods.vuw.ac.nz 4.18.0-553.30.1.el8_10.x86_64 #1 SMP Tue Nov 26
18:56:25 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@node3 /]#
root@vuwunicoslurmd1 log]# sinfo -V
slurm 22.05.9
[root@vuwun
I cannot get node3 to work.
After some minutes 4~6 stop but that appears to munge sulking.
Node7 never works, seems to be the hwclock is faulty I cant set it so I'll
ignore it.
My problem is node3, i cant fathom why when 1 & 2 run 3 wont work with slurm,
it doesnt appear to be munge.
[root@
Does the accounting database keep this? Maybe I'm missing something but I don't
see a way to query for it in sacct.
Chris
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
Hi,
I have fixed a time skew.
Nodes still down so it wasnt time skew.
I have run tests as per munge docs and it all looks OK.
[root@node1 ~]# munge -n | unmunge | grep STATUS
STATUS: Success (0)
[root@node1 ~]#
root@node1 ~]# munge -n | unmunge
STATUS: Success (0)
ENCODE_H
Mmmm, from https://slurm.schedmd.com/sbatch.html
> By default both standard output and standard error are directed to a file
of the name "slurm-%j.out", where the "%j" is replaced with the job
allocation number.
Perhaps at your site there's a configuration which uses separate error
files? See the
Dear Slurm-user list,
Sadly, my question got no answers. If the question is unclear and you
have ideas how I can improve it, please let me know. We will soon try to
update Slurm to see if the unwanted behavior disappears with that.
Best regards,
Xaver Stiensmeier
On 11/18/24 12:03, Xaver Stiens
iHi,
On Sun, 2024-12-08 at 21:57:11 +, Slurm users wrote:
> I have just rebuilt all my nodes and I see
Did they ever work before with Slurm? (Which version?)
> Only 1 & 2 seem available?
> While 3~6 are not
Either you didn't wait long enough (5 minutes should be sufficient),
or the "down*"