Hi All, I am having trouble calculating the real RSS memory usage by some kind of users' jobs. Which the sacct returned wrong numbers.
Rocky Linux release 8.5, Slurm 21.08 (slurm.conf) ProctrackType=proctrack/cgroup JobAcctGatherType=jobacct_gather/linux The troubling jobs are like: 1. python spawn multithreading 96 threads; 2. Each thread uses SKlearn which again spawns 96 threads using openmp. Which is obviously over running the node, and I want to address it. The node has 300GB RAM, but the "sacct" (and seff) reports 1.2TB MaxRSS(also AveRSS). This does not look correct. I am suspecting that whether the SLurm+jobacct_gather/linux repeatedly sums up the memory used by all these threads, multiple counted the same thing many times. For the openMP part, maybe it is fine for slurm; while for python/multithreading, maybe it can not work well with Slurm for memory accounting? So, if this is the case, maybe 1.2TB/96= 12GB MaxRSS? I want to get the right MaxRSS to report to users. Thanks! Best, Feng -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com