Jean-mathieu CHANTREIN <jean-mathieu.chantr...@univ-angers.fr> writes:
> I tried using, in slurm.conf > TaskPlugin=task/affinity, task/cgroup > SelectTypeParameters=CR_CPU_Memory > MemLimitEnforce=yes > > and in cgroup.conf: > CgroupAutomount=yes > ConstrainCores=yes > ConstrainRAMSpace=yes > ConstrainSwapSpace=yes > MaxSwapPercent=10 > TaskAffinity=no We have a very similar setup, the biggest difference being that we have MemLimitEnforce=no, and leave the killing to the kernel's cgroup. For us, jobs are killed as they should. Here are a couple of things you could check: - Does it work if you remove the space in "TaskPlugin=task/affinity, task/cgroup"? (Slurm can be quite picky when reading slurm.conf). - See in slurmd.log on the node(s) of the job if cgroup actually gets activated and starts limit memory for the job, or if there are any errors related to cgroup. - While a job is running, see in the cgroup memory directory (typically /sys/fs/cgroup/memory/slurm/uid_<num>/job_<num> for the job (on the compute node). Does the values there, for instance memory.limit_in_bytes and memory.max_usage_in_bytes, make sense? -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature