Hello, Trying again with the slurm.conf This time. I have a cluster name: Autobot In this cluster I have servers: Optimus[1-10] and Megatron[1-10].
I sent 3000 jobs with feature Optimus and part are running while part are pendind. Which is ok. But I have sent 1000 jobs to Megatron and they are all in pending stating they wait because of priority. Whay os that? B.t.w if I change their priority to a higher one, they start to run on Megatron. SLURM.CONF ControlMachine=slurmserver ControlAddr=131.1.1.1 AuthType=auth/munge CacheGroups=0 CryptoType=crypto/munge MpiDefault=none MpiParams=ports=12000-12999 ProctrackType=proctrack/linuxproc ReturnToService=2 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none MaxJobCount=120000 PriorityType= priority/basic TaskPlugin=task/none InactiveLimit=0 KillWait=30 CompleteWait=10 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 FastSchedule=1 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/cons_res SelectTypeParameters=CR_LLN,CR_CPU_Memory AccountingStorageType=accounting_storage/filetxt AccountingStorageLoc=/etc/slurm/slurmAccount.txt AccountingStoreJobComment=YES ClusterName=MyCluster JobCompLoc=/var/log/slurm/jobcom.log JobCompType=jobcomp/filetxt JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=4 SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=4 SlurmdLogFile=/var/log/slurm/slurmd.log PreemptMode=requeue PreemptType=preempt/partition_prio DefMemPerCPU=10 DebugFlags=NO_CONF_HASH ############################################### # C O M P U T E N O D E S # ############################################### ######################## # SLURM Server # ######################## NodeName=slurmserver NodeAddr=131.1.1.1 CPUs=4 State=UNKNOWN ######################## # Autobot-Cluster # ######################## NodeName=Optimus1 NodeAddr=131.1.20.31 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus2 NodeAddr=131.1.20.32 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus3 NodeAddr=131.1.20.33 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus4 NodeAddr=131.1.20.34 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus5 NodeAddr=131.1.20.35 CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus6 NodeAddr=131.1.20.36 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus7 NodeAddr=131.1.20.37 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus8 NodeAddr=131.1.20.38 CPUs=12 RealMemory=64410 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus9 NodeAddr=131.1.20.39 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus NodeName=Optimus10 NodeAddr=131.1.20.40 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus NodeName=Megatron1 NodeAddr=131.1.20.41 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron2 NodeAddr=131.1.20.42 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron3 NodeAddr=131.1.20.43 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron4 NodeAddr=131.1.20.44 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron5 NodeAddr=131.1.20.45 CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron6 NodeAddr=131.1.20.46 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron7 NodeAddr=131.1.20.47 CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron8 NodeAddr=131.1.20.48 CPUs=12 RealMemory=64410 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron9 NodeAddr=131.1.20.49 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron NodeName=Megatron10 NodeAddr=131.1.20.50 CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron ############################################### # P A R T I T I O N S # ############################################### PartitionName=Autobot-Cluster Nodes=Optimus[1-10],Megatron[1-10] Default=YES MaxTime=28800 State=UP LLN=YES Priority=10 Thanks in advanced, Roy *********************************************************************************************** Please consider the environment before printing this email ! The information contained in this communication is proprietary to Israel Aerospace Industries Ltd. and/or third parties, may contain confidential or privileged information, and is intended only for the use of the intended addressee thereof. If you are not the intended addressee, please be aware that any use, disclosure, distribution and/or copying of this communication is strictly prohibited. If you receive this communication in error, please notify the sender immediately and delete it from your computer. Thank you. Visit us at: www.iai.co.il