Hi all,

I found a problem that i can submit jobs with --mem on a node with total of 
more RAM then node physically has.

For example if my node has 128GB of ram, i can submit jobs with total of 200GB 
ram and slurm will send those jobs to Running state.

If i submit a job like:

srun --mem=150000 -w node01 script.sh

I will be rejected, and the job will not start.

But if i submit 3 jobs like:

srun --mem=50000 -w node01 script.sh
srun --mem=50000 -w node01 script.sh
srun --mem=50000 -w node01 script.sh

Then slurm will run all those jobs and will ignore the total of 150GB memory 
allocated when physically node has only 128GB

Can anyone help me limiting slurm from sending jobs with overallocated memory?

Slurm.conf:

ControlMachine=master
AuthType=auth/munge
CryptoType=crypto/munge
SlurmctldPidFile=/slurmctld.pid
SlurmctldPort=****
SlurmdPidFile=/slurmd.pid
SlurmdPort=****
SlurmdSpoolDir=l/slurmd
SlurmUser=slurmuser
SlurmdUser=slurmduser
StateSaveLocation=/slurmd
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=200
MinJobAge=300

SlurmctldTimeout=120
SlurmdTimeout=300

Waittime=0
JobFileAppend=1
SallocDefaultCommand="$SHELL"
#
#
# SCHEDULING
DefMemPerNode=1000
#######################
FastSchedule=0
SchedulerType=sched/backfill
SchedulerPort=****
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
#
AccountingStorageEnforce=limits,qos
###
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
ClusterName=*cluster*
AccountingStorageHost=master

JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=debug4
SlurmdDebug=debug4

# Job Preemption
PreemptType=preempt/partition_prio
##PreemptMode=SUSPEND,GANG
PreemptMode=REQUEUE

CacheGroups=0

# COMPUTE NODES
NodeName=master State=DOWN
NodeName=node[001-032] State=UNKNOWN Sockets=2 CoresPerSocket=10 
ThreadsPerCore=1 RealMemory=128000


--
Best Wishes,
Igor.

Reply via email to