On 5/2/20 1:44 pm, Antony Cleave wrote:
Hi, from what you are describing it sounds like jobs are backfilling in
front and stopping the large jobs from starting
We use a feature that SchedMD implemented for us called
"bf_min_prio_reserve" which lets you set a priority threshold below
which Slurm won't make a forward reservation for a job (and so can only
start if it can start right now without delaying other jobs).
https://slurm.schedmd.com/slurm.conf.html#OPT_bf_min_prio_reserve
So if you can arrange your local priority system so that large jobs are
over that threshold and smaller jobs are below it (or whatever suits
your use case) then you should have a way to let these large jobs get a
reliable start time without smaller jobs pushing them back in time.
There's some useful background from the bug where this was implemented:
https://bugs.schedmd.com/show_bug.cgi?id=2565
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA