Hi, No changes. My example used /tmp, but the behaviour is the same for copies between any filesystems (e.g. from a distributed fs to another distributed fs).
Guillaume ----- Mail original ----- De: "Stijn De Weirdt via slurm-users" <slurm-users@lists.schedmd.com> À: slurm-users@lists.schedmd.com Envoyé: Jeudi 22 Mai 2025 11:33:18 Objet: [slurm-users] Re: Wrong MaxRSS Behavior with cgroup v2 in Slurm salut guillaume, nothing else is different between the v1 and v2 setup? (/tmp is tmpfs on v2 setup perhaps?) stijn On 5/22/25 11:10, Guillaume COCHARD via slurm-users wrote: > Hello, > > We've noticed a recent change in how MaxRSS is reported on our cluster. > Specifically, the MaxRSS value for many jobs now often matches the allocated > memory, which was not the case previously. It appears this change is due to > how Slurm accounts for memory when copying large files, likely as a result of > moving from cgroup v1 to cgroup v2. > > Here’s a simple example: > > copy_file.sh > #!/bin/bash > cp /distributed/filesystem/file5G /tmp > cp /tmp/file5G ~ > > Two jobs with different memory allocations: > > Job 1 > sbatch -c 1 --mem=1G copy_file.sh > seff <jobid> > Memory Utilized: 1021.87 MB > Memory Efficiency: 99.79% of 1.00 GB > > Job 2 > sbatch -c 1 --mem=10G copy_file.sh > seff <jobid> > Memory Utilized: 4.02 GB > Memory Efficiency: 40.21% of 10.00 GB > > With cgroup v1, this script typically showed minimal memory usage. Now, under > cgroup v2, memory usage appears inflated and depends on the allocated memory, > which seems wrong. > > I believe this behavior aligns with similar issues raised by the Kubernetes > community [1], and is consistent with how memory.current behaves in cgroup v2 > [3]. > > According to Slurm’s documentation about cgroup v2, "this plugin provides > cgroup's memory.current value from the memory interface, which is not equal > to the RSS value provided by procfs. Nevertheless it is the same value that > the kernel uses in its OOM killer logic." [2] > > While technically correct, this seems to mark a significant change in what > MaxRSS and "Memory Efficiency" actually measure and renders those metrics > almost useless. > > Our Configuration: > ProctrackType=proctrack/cgroup > TaskPlugin=task/cgroup,task/affinity > > Question: > Is there a way to restore more realistic MaxRSS values — specifically, ones > that exclude file-backed page cache — while still using cgroup v2? > > Thanks, > Guillaume > > References: > > [1] https://github.com/kubernetes/kubernetes/issues/118916 > [2] https://slurm.schedmd.com/cgroup_v2.html#limitations > [3] https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com