I'm not sure I can help with the rest, but the EnforcePartLimits setting will only reject a job at submission time that exceeds partition limits, not overall cluster limits. I don't see anything, offhand, in the interactive partition definition that is exceeded by your request for 4 GB/CPU.
Rob ________________________________ From: slurm-users on behalf of Angel de Vicente Sent: Monday, July 24, 2023 7:20 AM To: Slurm User Community List Subject: [slurm-users] MaxMemPerCPU not enforced? Hello, I'm trying to get Slurm to control the memory used per CPU, but it does not seem to enforce the MaxMemPerCPU option in slurm.conf This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3. Relevant configuration options: ,----cgroup.conf | AllowedRAMSpace=100 | ConstrainCores=yes | ConstrainRAMSpace=yes | ConstrainSwapSpace=yes | AllowedSwapSpace=0 `---- ,----slurm.conf | TaskPlugin=task/affinity,task/cgroup | PrologFlags=X11 | | SelectType=select/cons_res | SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK | MaxMemPerCPU=500 | DefMemPerCPU=200 | | JobAcctGatherType=jobacct_gather/linux | | EnforcePartLimits=ALL | | NodeName=xxx RealMemory=257756 Sockets=4 CoresPerSocket=8 ThreadsPerCore=1 Weight=1 | | PartitionName=batch Nodes=duna State=UP Default=YES MaxTime=2-00:00:00 MaxCPUsPerNode=32 OverSubscribe=FORCE:1 | PartitionName=interactive Nodes=duna State=UP Default=NO MaxTime=08:00:00 MaxCPUsPerNode=32 OverSubscribe=FORCE:2 `---- I can ask for an interactive session with 4GB/CPU (I would have thought that "EnforcePartLimits=ALL" would stop me from doing that), and once I'm in the interactive session I can execute a 3GB test code without any issues (I can see with htop that the process does indeed use a RES size of 3GB at 100% CPU use). Any idea what could be the problem or how to start debugging this? ,---- | [angelv@xxx test]$ sinter -n 1 --mem-per-cpu=4000 | salloc: Granted job allocation 127544 | salloc: Nodes xxx are ready for job | | (sinter) [angelv@xxx test]$ stress -m 1 -t 600 --vm-keep --vm-bytes 3G | stress -m 1 -t 600 --vm-keep --vm-bytes 3G | stress: info: [1772392] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd `---- Many thanks, -- Ángel de Vicente Research Software Engineer (Supercomputing and BigData) Tel.: +34 922-605-747 Web.: http://research.iac.es/proyecto/polmag/ GPG: 0x8BDC390B69033F52