[slurm-users] memory high water mark reporting

Emyr James via slurm-users Thu, 16 May 2024 15:08:52 -0700

Hi,

We are trying out slurm having been running grid engine for a long while.
In grid engine, the cgroups peak memory and max_rss are generated at the end of 
a job and recorded. It logs the information from the cgroup hierarchy as well 
as doing a getrusage call right at the end on the parent pid of the whole job 
"container" before cleaning up.
With slurm it seems that the only way memory is recorded is by the acct gather 
polling. I am trying to add something in an epilog script to get the 
memory.peak but It looks like the cgroup hierarchy has been destroyed by the 
time the epilog is run.
Where in the code is the cgroup hierarchy cleared up ? Is there no way to add 
something in so that the accounting is updated during the job cleanup process 
so that peak memory usage can be accurately logged ?


I can reduce the polling interval from 30s to 5s but don't know if this causes 
a lot of overhead and in any case this seems to not be a sensible way to get 
values that should just be determined right at the end by an event rather than 
using polling.

Many thanks,

Emyr

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] memory high water mark reporting

Reply via email to