[slurm-users] Glusterfs hints for state database

2023-09-07 Thread Michael Gutteridge
We've settled on the idea of using a glusterfs file system for rolling out an HA Slurm controller. Over the last year we've averaged 88,000 job submissions per day, though it's usually lower than that (10-20K). Disk activity on the existing state databaseseems to be maxing out around 40-50 io/s wi

[slurm-users] Slurm version 23.02.5 is now available

2023-09-07 Thread Tim McMullan
We are pleased to announce the availability of Slurm version 23.02.5. The 23.02.5 release includes a number of stability fixes and some fixes for notable regressions. The SLURM_NTASKS environment variable that in 23.02.0 was not set when using --ntasks-per-node has been changed back to its 22

Re: [slurm-users] Tracking efficiency of all jobs on the cluster (dashboard etc.)

2023-09-07 Thread Angel de Vicente
Hi Will, Will Furnell - STFC UKRI writes: > That does sound like an interesting solution – yes please would you be > able to send me (or us if you’re willing to share it to the list) > through some more information please? > > And thank you everyone else that has replied to my email – there’s >

Re: [slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

2023-09-07 Thread Angel de Vicente
Hello Cristobal, Cristóbal Navarro writes: > Hello Angel and Community, > I am facing a similar problem with a DGX A100 with DGX OS 6 (Based on > Ubuntu 22.04 LTS) and Slurm 23.02. > When I execute `slurmd` service, it status shows failed with the > following information below. > As of today, w

[slurm-users] Fwd: Limiting I/O speed in slurm jobs

2023-09-07 Thread Eugene Teoh
Hi guys, I'm trying to figure out how to set a per task/job limit for I/O speed (IOPS, throughput or maybe both, even better, io.latency ). After reading around the documentation and forums, I