Hi Roy,

What command are you using to start the jobs?

On 11/14/2017 09:58 AM, Zohar Roe MLM wrote:

Hello,

Trying again with the slurm.conf This time.

I have a cluster name: Autobot

In this cluster I have servers:

Optimus[1-10] and

Megatron[1-10].

I sent 3000 jobs with feature Optimus and part are running while part are pendind. Which is ok.

But I have sent 1000 jobs to Megatron and they are all in pending stating they wait because of priority. Whay os that?

B.t.w if I change their priority to a higher one, they start to run on Megatron.

SLURM.CONF

ControlMachine=slurmserver

ControlAddr=131.1.1.1

AuthType=auth/munge

CacheGroups=0

CryptoType=crypto/munge

MpiDefault=none

MpiParams=ports=12000-12999

ProctrackType=proctrack/linuxproc

ReturnToService=2

SlurmctldPidFile=/var/run/slurmctld.pid

SlurmctldPort=6817

SlurmdPidFile=/var/run/slurmd.pid

SlurmdPort=6818

SlurmdSpoolDir=/var/spool/slurmd

SlurmUser=slurm

StateSaveLocation=/var/spool/slurmctld

SwitchType=switch/none

MaxJobCount=120000

PriorityType= priority/basic

TaskPlugin=task/none

InactiveLimit=0

KillWait=30

CompleteWait=10

MinJobAge=300

SlurmctldTimeout=120

SlurmdTimeout=300

Waittime=0

FastSchedule=1

SchedulerType=sched/backfill

SchedulerPort=7321

SelectType=select/cons_res

SelectTypeParameters=CR_LLN,CR_CPU_Memory

AccountingStorageType=accounting_storage/filetxt

AccountingStorageLoc=/etc/slurm/slurmAccount.txt

AccountingStoreJobComment=YES

ClusterName=MyCluster

JobCompLoc=/var/log/slurm/jobcom.log

JobCompType=jobcomp/filetxt

JobAcctGatherFrequency=30

JobAcctGatherType=jobacct_gather/none

SlurmctldDebug=4

SlurmctldLogFile=/var/log/slurm/slurmctld.log

SlurmdDebug=4

SlurmdLogFile=/var/log/slurm/slurmd.log

PreemptMode=requeue

PreemptType=preempt/partition_prio

DefMemPerCPU=10

DebugFlags=NO_CONF_HASH

###############################################

#   C O M P U T E    N O D E S                #

###############################################

########################

#   SLURM Server       #

########################

NodeName=slurmserver NodeAddr=131.1.1.1   CPUs=4 State=UNKNOWN

########################

#   Autobot-Cluster     #

########################

NodeName=Optimus1 NodeAddr=131.1.20.31    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus2 NodeAddr=131.1.20.32    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus3 NodeAddr=131.1.20.33    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus4 NodeAddr=131.1.20.34    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus5 NodeAddr=131.1.20.35    CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus6 NodeAddr=131.1.20.36    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus7 NodeAddr=131.1.20.37    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus8 NodeAddr=131.1.20.38    CPUs=12 RealMemory=64410 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus9 NodeAddr=131.1.20.39    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus

NodeName=Optimus10 NodeAddr=131.1.20.40    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,optimus

NodeName=Megatron1 NodeAddr=131.1.20.41    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron2 NodeAddr=131.1.20.42    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron3 NodeAddr=131.1.20.43    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron4 NodeAddr=131.1.20.44    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron5 NodeAddr=131.1.20.45    CPUs=24 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron6 NodeAddr=131.1.20.46    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron7 NodeAddr=131.1.20.47    CPUs=16 RealMemory=129022 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron8 NodeAddr=131.1.20.48    CPUs=12 RealMemory=64410 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron9 NodeAddr=131.1.20.49    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron

NodeName=Megatron10 NodeAddr=131.1.20.50    CPUs=12 RealMemory=96728 State=UNKNOWN Feature=autobot,megatron

###############################################

#       P A R T I T I O N S                   #

###############################################

PartitionName=Autobot-Cluster Nodes=Optimus[1-10],Megatron[1-10]  Default=YES MaxTime=28800 State=UP  LLN=YES Priority=10

Thanks in advanced,

Roy


*********************************************************************************************** Please consider the environment before printing this email ! The information contained in this communication is proprietary to Israel Aerospace Industries Ltd. and/or third parties, may contain confidential or privileged information, and is intended only for the use of the intended addressee thereof. If you are not the intended addressee, please be aware that any use, disclosure, distribution and/or copying of this communication is strictly prohibited. If you receive this communication in error, please notify the sender immediately and delete it from your computer. Thank you. Visit us at: www.iai.co.il


Reply via email to