Hello,

I'm trying to get Slurm to control the memory used per CPU, but it does
not seem to enforce the MaxMemPerCPU option in slurm.conf

This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3.

Relevant configuration options:

,----cgroup.conf
| AllowedRAMSpace=100
| ConstrainCores=yes
| ConstrainRAMSpace=yes
| ConstrainSwapSpace=yes
| AllowedSwapSpace=0
`----

,----slurm.conf
| TaskPlugin=task/affinity,task/cgroup
| PrologFlags=X11
| 
| SelectType=select/cons_res
| SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
| MaxMemPerCPU=500
| DefMemPerCPU=200
| 
| JobAcctGatherType=jobacct_gather/linux
| 
| EnforcePartLimits=ALL
| 
| NodeName=xxx RealMemory=257756 Sockets=4 CoresPerSocket=8 ThreadsPerCore=1 
Weight=1
| 
| PartitionName=batch       Nodes=duna State=UP Default=YES MaxTime=2-00:00:00 
MaxCPUsPerNode=32 OverSubscribe=FORCE:1
| PartitionName=interactive Nodes=duna State=UP Default=NO  MaxTime=08:00:00   
MaxCPUsPerNode=32 OverSubscribe=FORCE:2
`----


I can ask for an interactive session with 4GB/CPU (I would have thought
that "EnforcePartLimits=ALL" would stop me from doing that), and once
I'm in the interactive session I can execute a 3GB test code without any
issues (I can see with htop that the process does indeed use a RES size
of 3GB at 100% CPU use). Any idea what could be the problem or how to
start debugging this?

,----
| [angelv@xxx test]$ sinter -n 1 --mem-per-cpu=4000
| salloc: Granted job allocation 127544
| salloc: Nodes xxx are ready for job
| 
| (sinter) [angelv@xxx test]$ stress -m 1 -t 600 --vm-keep --vm-bytes 3G
| stress -m 1 -t 600 --vm-keep --vm-bytes 3G
| stress: info: [1772392] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
`----

Many thanks,
-- 
Ángel de Vicente
 Research Software Engineer (Supercomputing and BigData)
 Tel.: +34 922-605-747
 Web.: http://research.iac.es/proyecto/polmag/

 GPG: 0x8BDC390B69033F52

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to