Hello,

I understand that this is not a straight forward question, however I'm 
wondering if anyone has any useful ideas, please. Our cluster is busy and the 
QOS has limited users to a maximum of 32 compute nodes on the "batch" queue. 
Users are making good of the cluster -- for example one user is running five 6 
node jobs at the moment. On the other hand, a job belonging to another user has 
been stalled in the queue for around 7 days. He has made reasonable use of the 
cluster and as a result his fairshare component is relatively low. Having said 
that, the priority of his job is high -- it currently one of the highest 
priority jobs in the batch partition queue. From sprio...


JOBID PARTITION   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION        
QOS

359323 batch         180292     100000      79646        547        100         
 0


I did think that the PriorityDecayHalfLife was quite high at 14 days and so I 
reduced that to 7 days. For reference I've included the key scheduling settings 
from the cluster below. Does anyone have any thoughts, please?


Best regards,

David


PriorityDecayHalfLife   = 7-00:00:00
PriorityCalcPeriod      = 00:05:00
PriorityFavorSmall      = No
PriorityFlags           = ACCRUE_ALWAYS,SMALL_RELATIVE_TO_TIME,FAIR_TREE
PriorityMaxAge          = 7-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType            = priority/multifactor
PriorityWeightAge       = 100000
PriorityWeightFairShare = 1000000
PriorityWeightJobSize   = 10000000
PriorityWeightPartition = 1000
PriorityWeightQOS       = 10000



Reply via email to