Hello,
Trying again with the slurm.conf This time.

I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].

I sent 3000 jobs with feature Optimus and part are running while part are 
pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are all in pending stating they 
wait because of priority. Whay os that?

B.t.w if I change their priority to a higher one, they start to run on Megatron.

SLURM.CONF

ControlMachine=slurmserver
ControlAddr=131.1.1.1
AuthType=auth/munge
CacheGroups=0
CryptoType=crypto/munge
MpiDefault=none
MpiParams=ports=12000-12999
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
MaxJobCount=120000
PriorityType= priority/basic
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
CompleteWait=10
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_LLN,CR_CPU_Memory
AccountingStorageType=accounting_storage/filetxt
AccountingStorageLoc=/etc/slurm/slurmAccount.txt
AccountingStoreJobComment=YES
ClusterName=MyCluster
JobCompLoc=/var/log/slurm/jobcom.log
JobCompType=jobcomp/filetxt
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=4
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=4
SlurmdLogFile=/var/log/slurm/slurmd.log
PreemptMode=requeue
PreemptType=preempt/partition_prio
DefMemPerCPU=10
DebugFlags=NO_CONF_HASH


###############################################
#   C O M P U T E    N O D E S                #
###############################################


########################
#   SLURM Server       #
########################
NodeName=slurmserver  NodeAddr=131.1.1.1   CPUs=4 State=UNKNOWN



########################
#   Autobot-Cluster     #
########################
NodeName=Optimus1   NodeAddr=131.1.20.31    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus2   NodeAddr=131.1.20.32    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus3   NodeAddr=131.1.20.33    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus4   NodeAddr=131.1.20.34    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus5   NodeAddr=131.1.20.35    CPUs=24 RealMemory=96728 
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus6   NodeAddr=131.1.20.36    CPUs=16 RealMemory=129022 
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus7   NodeAddr=131.1.20.37    CPUs=16 RealMemory=129022 
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus8   NodeAddr=131.1.20.38    CPUs=12 RealMemory=64410  
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus9   NodeAddr=131.1.20.39    CPUs=12 RealMemory=96728  
State=UNKNOWN Feature=autobot,optimus
NodeName=Optimus10  NodeAddr=131.1.20.40    CPUs=12 RealMemory=96728  
State=UNKNOWN Feature=autobot,optimus

NodeName=Megatron1   NodeAddr=131.1.20.41    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron2   NodeAddr=131.1.20.42    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron3   NodeAddr=131.1.20.43    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron4   NodeAddr=131.1.20.44    CPUs=12 RealMemory=96728 
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron5   NodeAddr=131.1.20.45    CPUs=24 RealMemory=96728 
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron6   NodeAddr=131.1.20.46    CPUs=16 RealMemory=129022 
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron7   NodeAddr=131.1.20.47    CPUs=16 RealMemory=129022 
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron8   NodeAddr=131.1.20.48    CPUs=12 RealMemory=64410  
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron9   NodeAddr=131.1.20.49    CPUs=12 RealMemory=96728  
State=UNKNOWN Feature=autobot,megatron
NodeName=Megatron10  NodeAddr=131.1.20.50    CPUs=12 RealMemory=96728  
State=UNKNOWN Feature=autobot,megatron


###############################################
#       P A R T I T I O N S                   #
###############################################
PartitionName=Autobot-Cluster Nodes=Optimus[1-10],Megatron[1-10]  Default=YES 
MaxTime=28800 State=UP  LLN=YES Priority=10



Thanks in advanced,
Roy


***********************************************************************************************

Please consider the environment before printing this email !
The information contained in this communication is proprietary to Israel 
Aerospace Industries Ltd. and/or third parties, may contain confidential or 
privileged information, and is intended only for the use of the intended 
addressee thereof.
If you are not the intended addressee, please be aware that any use, 
disclosure, distribution and/or copying of this communication is strictly 
prohibited. If you receive this communication in error, please notify the 
sender immediately and delete it from your computer. 
Thank you.

Visit us at:   www.iai.co.il

Reply via email to