[slurm-dev] Re: SLURM 17.02.8 not optimally scheduling jobs/utilizing resources

Ole Holm Nielsen Wed, 25 Oct 2017 04:57:34 -0700


On 10/25/2017 01:52 PM, Holger Naundorf wrote:

I'd really appreciate any help the SLURM wizards can provide! We suspect
it's something to do with how we've set up QoS or maybe, we need to
tweak the scheduler configuration in 17.02.8 however there's no single
clear path forward. Just let me know if there's any further information
I can provide to help troubleshoot or give fodder for suggestions.


While I am in no way a SLURM wizard - one thing i would try is
increasing 'bf_max_job_test' to s.th. much bigger (in the order of the
usual length of your queued up jobs). In this setting (as far as I
understand it) as soon as your 50 top priority queued jobs are waiting
for 'legitimate' reasons (i.e. their designated nodes/QOS is full)
everything below them will not get backfilled anymore.

I agree that the backfill scheduler requires configuration beyond thedefault settings! This surprised me as well. I wrote some notes in myWiki which could be used as a starting point:https://wiki.fysik.dtu.dk/niflheim/Slurm_scheduler#backfill-scheduler


/Ole

[slurm-dev] Re: SLURM 17.02.8 not optimally scheduling jobs/utilizing resources

Reply via email to