They fix this in newer versions of Slurm. We had the same issue with
older versions so we hard to run with the config_override option on to
keep the logs quiet. They changed the way logging was done in the more
recent releases and its not as chatty.
-Paul Edmon-
On 5/12/22 7:35 AM, Per Lönn
Per Lönnborg writes:
> I "forgot" to tell our version because it´s a bit embarrising - 19.05.8...
Haha! :D
--
B/H
signature.asc
Description: PGP signature
Per Lönnborg writes:
> Greetings,
God dag!
> is there a way to lower the log rate on error messages in slurmctld for nodes
> with hardware errors?
You don't say which version of Slurm you are running, but I think this
was changed in 21.08, so the node will only try to register once if it
has
Greetings,
is there a way to lower the log rate on error messages in slurmctld for nodes
with hardware errors?
We see for example this for a node that has DIMM errors:
[2022-05-12T07:07:34.757] error: Node node37 has low real_memory size (257642 <
257660)
[2022-05-12T07:07:35.760] error: