Hi all, I found a problem that i can submit jobs with --mem on a node with total of more RAM then node physically has.
For example if my node has 128GB of ram, i can submit jobs with total of 200GB ram and slurm will send those jobs to Running state. If i submit a job like: srun --mem=150000 -w node01 script.sh I will be rejected, and the job will not start. But if i submit 3 jobs like: srun --mem=50000 -w node01 script.sh srun --mem=50000 -w node01 script.sh srun --mem=50000 -w node01 script.sh Then slurm will run all those jobs and will ignore the total of 150GB memory allocated when physically node has only 128GB Can anyone help me limiting slurm from sending jobs with overallocated memory? Slurm.conf: ControlMachine=master AuthType=auth/munge CryptoType=crypto/munge SlurmctldPidFile=/slurmctld.pid SlurmctldPort=**** SlurmdPidFile=/slurmd.pid SlurmdPort=**** SlurmdSpoolDir=l/slurmd SlurmUser=slurmuser SlurmdUser=slurmduser StateSaveLocation=/slurmd SwitchType=switch/none TaskPlugin=task/none InactiveLimit=0 KillWait=200 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 JobFileAppend=1 SallocDefaultCommand="$SHELL" # # # SCHEDULING DefMemPerNode=1000 ####################### FastSchedule=0 SchedulerType=sched/backfill SchedulerPort=**** SelectType=select/cons_res SelectTypeParameters=CR_CPU # AccountingStorageEnforce=limits,qos ### AccountingStorageType=accounting_storage/slurmdbd AccountingStoreJobComment=YES ClusterName=*cluster* AccountingStorageHost=master JobCompType=jobcomp/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=debug4 SlurmdDebug=debug4 # Job Preemption PreemptType=preempt/partition_prio ##PreemptMode=SUSPEND,GANG PreemptMode=REQUEUE CacheGroups=0 # COMPUTE NODES NodeName=master State=DOWN NodeName=node[001-032] State=UNKNOWN Sockets=2 CoresPerSocket=10 ThreadsPerCore=1 RealMemory=128000 -- Best Wishes, Igor.
