[slurm-users] Re: Print Slurm Stats on Login

2024-08-09 Thread Jeffrey T Frey via slurm-users
You'd have to do this within e.g. the system's bashrc infrastructure. The simplest idea would be to add to e.g. /etc/profile.d/zzz-slurmstats.sh and have some canned commands/scripts running. That does introduce load to the system and Slurm on every login, though, and slows the startup of logi

[slurm-users] Re: Slurmctld process error 'double free or corruption' on RHEL 9 (Rocky Linux)

2024-07-16 Thread Jeffrey T Frey via slurm-users
I can confirm on a freshly-installed RockyLinux 9.4 system, the dbus-devel package was not installed by default. The Development Tools # dnf repoquery --groupmember dbus-devel Last metadata expiration check: 2:04:16 ago on Tue 16 Jul 2024 12:02:50 PM EDT. dbus-devel-1:1.12.20-8.el9.i686 dbus-de

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Jeffrey T Frey via slurm-users
> AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n" > is per user. The ulimit is a frontend to rusage limits, which are per-process restrictions (not per-user). The fs.file-max is the kernel's limit on how many file descriptors can be open in aggregate. You'd have to edit

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-15 Thread Jeffrey T Frey via slurm-users
https://github.com/dun/munge/issues/94 The NEWS file claims this was fixed in 0.5.15. Since your log doesn't show the additional strerror() output you're definitely running an older version, correct? If you go on one of the affected nodes and do an `lsof -p ` I'm betting you'll find a long

[slurm-users] Re: Restricting local disk storage of jobs

2024-02-07 Thread Jeffrey T Frey via slurm-users
The native job_container/tmpfs would certainly have access to the job record, so modification to it (or a forked variant) would be possible. A SPANK plugin should be able to fetch the full job record [1] and is then able to inspect the "gres" list (as a C string), which means I could modify UD'

[slurm-users] Re: Restricting local disk storage of jobs

2024-02-06 Thread Jeffrey T Frey via slurm-users
Most of my ideas have revolved around creating file systems on-the-fly as part of the job prolog and destroying them in the epilog. The issue with that mechanism is that formatting a file system (e.g. mkfs.) can be time-consuming. E.g. formatting your local scratch SSD as an LVM PV+VG and all