[slurm-users] Backfill Scheduling

Reed Dier Mon, 26 Jun 2023 15:51:55 -0700

Hoping this will be an easy one for the community.

The priority schema was recently reworked for our cluster, with only 
PriorityWeightQOS and PriorityWeightAge contributing to the priority value, 
while PriorityWeightAssoc, PriorityWeightFairshare, PriorityWeightJobSize, and 
PriorityWeightPartition are now set to 0, and PriorityFavorSmall set to NO.
The cluster is fairly loaded right now, with a big backlog of work (~250 
running jobs, ~40K pending jobs).
The majority of these jobs are arrays, which runs the pending job count up 
quickly.


What I’m trying to figure out is:
The next highest priority job array in the queue is waiting on resources, 
everything else on priority, which makes sense.
However, there is a good portion of the cluster unused, seemingly dammed by the 
next up job being large, while there are much smaller jobs behind it that could 
easily fit into the available resources footprint.

Is this an issue with the relative FIFO nature of the priority scheduling 
currently with all of the other factors disabled,
or since my queue is fairly deep, is this due to bf_max_job_test being the 
default 100, and it can’t look deep enough into the queue to find a job that 
will fit into what is unoccupied?
PriorityType=priority/multifactor
SchedulerType=sched/backfill

Hoping to know where I might want to swing my hammer next, without whacking the 
wrong setting

Appreciate any advice,
Reed

smime.p7s
Description: S/MIME cryptographic signature

[slurm-users] Backfill Scheduling

Reply via email to