It might be useful to include the various priority factors you've got configured. The fact that adjusting PriorityMaxAge had a dramatic effect suggests that the age factor is pretty high- might be worth looking at that value relative to the other factors.
Have you looked at PriorityWeightJobSize? Might have some utility if you're finding large jobs getting short-shrift. - Michael On Tue, Apr 9, 2019 at 2:01 AM David Baker <d.j.ba...@soton.ac.uk> wrote: > Hello, > > I've finally got the job throughput/turnaround to be reasonable in our > cluster. Most of the time the job activity on the cluster sets the default > QOS to 32 nodes (there are 464 nodes in the default queue). Jobs requesting > nodes close to the QOS level (for example 22 nodes) are scheduled within 24 > hours which is better than it has been. Still I suspect there is room for > improvement. I note that these large jobs still struggle to be given a > starttime, however many jobs are now being given a starttime following my > SchedulerParameters makeover. > > I used advice from the mailing list and the Slurm high throughput document > to help me make changes to the scheduling parameters. They are now... > > > SchedulerParameters=assoc_limit_continue,batch_sched_delay=20,bf_continue,bf_interval=300,bf_min_age_reserve=10800,bf_window=3600,bf_resolution=600,bf_yield_interval=1000000,partition_job_depth=500,sched_max_job_start=200,sched_min_interval=2000000 > > Also.. > PriorityFavorSmall=NO > PriorityFlags=SMALL_RELATIVE_TO_TIME,ACCRUE_ALWAYS,FAIR_TREE > PriorityType=priority/multifactor > PriorityDecayHalfLife=7-0 > PriorityMaxAge=1-0 > > The most significant change was actually reducing "PriorityMaxAge" to 7-0 > to 1-0. Before that change the larger jobs could hang around in the queue > for days. Does it make sense therefore to further reduce PriorityMaxAge to > less than 1 day? Your advice would be appreciated, please. > > Best regards, > David > > > >