Hi Reed, Reed Dier <reed.d...@focusvq.com> writes:
> Hoping this will be an easy one for the community. > > The priority schema was recently reworked for our cluster, with only > PriorityWeightQOS and PriorityWeightAge contributing to the priority > value, while PriorityWeightAssoc, PriorityWeightFairshare, > PriorityWeightJobSize, and PriorityWeightPartition are now set to 0, > and PriorityFavorSmall set to NO. > The cluster is fairly loaded right now, with a big backlog of work (~250 > running jobs, ~40K pending jobs). > The majority of these jobs are arrays, which runs the pending job count up > quickly. > > What I’m trying to figure out is: > The next highest priority job array in the queue is waiting on resources, > everything else on priority, which makes sense. > However, there is a good portion of the cluster unused, seemingly > dammed by the next up job being large, while there are much smaller > jobs behind it that could easily fit into the available resources > footprint. > > Is this an issue with the relative FIFO nature of the priority scheduling > currently with all of the other factors disabled, > or since my queue is fairly deep, is this due to bf_max_job_test being > the default 100, and it can’t look deep enough into the queue to find > a job that will fit into what is unoccupied? It could be that bf_max_job_test is too low. On our system some users think it is a good idea to submit lots of jobs with identical resource requirements by writing a loop around sbatch. Such jobs will exhaust the bf_max_job_test very quickly. Thus we increased the limit to 1000 and try to persuade users to use job arrays instead of home-grown loops. This seem to work OK[1]. Cheers, Loris > PriorityType=priority/multifactor > SchedulerType=sched/backfill > > Hoping to know where I might want to swing my hammer next, without whacking > the wrong setting > > Appreciate any advice, > Reed > Footnotes: [1] One problem we still have to address is that we don't have an array-enabled version of the 'subgXX' script for the quantum chemistry program Gaussian. This is a Perl script which parses the input for the program, generates a job script and submits it. An array-enabled version would have to stipulate a specific mapping between the array task ID and the way the input files are organised. We are currently not sure about the best way to do this in a suitably generic way. -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin