Dear slurm-user list,
as far as I understood it, the slurm.conf needs to be present on the
master and on the workers at slurm.conf (if no other path is set via
SLURM_CONF). However, I noticed that when adding a partition only in the
master's slurm.conf, all workers were able to "correctly" show t
I know this isn't a developer forum, but I don't really know where else to ask.
I've had no luck with Stackoverflow. Is there no input on this?
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
We have some new AMD EPYC compute nodes with 96 cores/node running
RockyLinux 8.9. We've had a number of incidents where the Munge log-file
/var/log/munge/munged.log suddenly fills up the root file system, after a
while to 100% (tens of GBs), and the node eventually comes to a grinding
halt!
https://github.com/dun/munge/issues/94
The NEWS file claims this was fixed in 0.5.15. Since your log doesn't show the
additional strerror() output you're definitely running an older version,
correct?
If you go on one of the affected nodes and do an `lsof -p ` I'm
betting you'll find a long
Xaver,
If you look at your slurmctld log, you likely end up seeing messages
about each node's slurm.conf not being the same as that on the master.
So, yes, it can work temporarily, but unless there are some very
specific settings done, issues will arise. The state you are in now, you
will wa
-- Forwarded message -
发件人: KK
Date: 2024年4月15日周一 13:25
Subject: sreport cluster UserUtilizationByaccount Used result versus
sreport job SizesByAccount or sacct: inconsistencies
To:
I wish to ascertain the CPU core hours utilized by user dj1 and dj. I have
tested with sreport cl