[slurm-users] Slurm.conf and workers

2024-04-15 Thread Xaver Stiensmeier via slurm-users
Dear slurm-user list, as far as I understood it, the slurm.conf needs to be present on the master and on the workers at slurm.conf (if no other path is set via SLURM_CONF). However, I noticed that when adding a partition only in the master's slurm.conf, all workers were able to "correctly" show t

[slurm-users] Re: Interfaces of topology/tree and Topology Awareness

2024-04-15 Thread Nico Derl via slurm-users
I know this isn't a developer forum, but I don't really know where else to ask. I've had no luck with Stackoverflow. Is there no input on this? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Munge log-file fills up the file system to 100%

2024-04-15 Thread Ole Holm Nielsen via slurm-users
We have some new AMD EPYC compute nodes with 96 cores/node running RockyLinux 8.9. We've had a number of incidents where the Munge log-file /var/log/munge/munged.log suddenly fills up the root file system, after a while to 100% (tens of GBs), and the node eventually comes to a grinding halt!

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-15 Thread Jeffrey T Frey via slurm-users
https://github.com/dun/munge/issues/94 The NEWS file claims this was fixed in 0.5.15. Since your log doesn't show the additional strerror() output you're definitely running an older version, correct? If you go on one of the affected nodes and do an `lsof -p ` I'm betting you'll find a long

[slurm-users] Re: Slurm.conf and workers

2024-04-15 Thread Brian Andrus via slurm-users
Xaver, If you look at your slurmctld log, you likely end up seeing messages about each node's slurm.conf not being the same as that on the master. So, yes, it can work temporarily, but unless there are some very specific settings done, issues will arise. The state you are in now, you will wa

[slurm-users] Fwd: sreport cluster UserUtilizationByaccount Used result versus sreport job SizesByAccount or sacct: inconsistencies

2024-04-15 Thread KK via slurm-users
-- Forwarded message - 发件人: KK Date: 2024年4月15日周一 13:25 Subject: sreport cluster UserUtilizationByaccount Used result versus sreport job SizesByAccount or sacct: inconsistencies To: I wish to ascertain the CPU core hours utilized by user dj1 and dj. I have tested with sreport cl