Hi,

to my very best knowledge MaxRSS does report aggregated memory consumption
of all tasks but including all the shared libraries that the individual
processes uses, even though a shared library is only loaded into memory
once regardless of how many processes use it.

So shared libraries do count multiple times (once for every individual
process) to MaxRSS when summed up. This can even result in a higher
MaxRSS value reported by sacct than the total amount of memory that is
physically available.

For exactly that reason we have decided to use
`JobAcctGatherParams=UsePss´ in slurm.conf as we think proportional
set size (PSS) is more useful than RSS because when the PSS for all
processes are summed together, that seems to be a better
representation for the "true" total memory consumption of the job.

Also see:

https://en.wikipedia.org/wiki/Proportional_set_size

Best regards
Jürgen


* Feng Zhang via slurm-users <slurm-users@lists.schedmd.com> [240607 14:37]:
> Hi All,
> 
> I am having trouble calculating the real RSS memory usage by some kind
> of users' jobs. Which the sacct returned wrong numbers.
> 
> Rocky Linux release 8.5, Slurm 21.08
> 
> (slurm.conf)
> ProctrackType=proctrack/cgroup
> JobAcctGatherType=jobacct_gather/linux
> 
> The troubling jobs are like:
> 
> 1. python spawn multithreading 96 threads;
> 
> 2. Each thread uses SKlearn which again spawns 96 threads using openmp.
> 
> Which is obviously over running the node, and I want to address it.
> 
> The node has 300GB RAM, but the "sacct" (and seff) reports 1.2TB
> MaxRSS(also AveRSS). This does not look correct.
> 
> 
> I am suspecting that whether the SLurm+jobacct_gather/linux repeatedly
> sums up the memory used by all these threads, multiple counted the
> same thing many times.
> 
> For the openMP part, maybe it is fine for slurm; while for
> python/multithreading, maybe it can not work well with Slurm for
> memory accounting?
> 
> So, if this is the case, maybe 1.2TB/96= 12GB MaxRSS?
> 
> I want to get the right MaxRSS to report to users.
> 
> Thanks!
> 
> Best,
> 
> Feng
> 
> -- 
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Attachment: smime.p7s
Description: S/MIME cryptographic signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to