[slurm-users] Backfill advice

david baker Sat, 23 Mar 2019 05:09:56 -0700

Hello,

We do have large jobs getting starved out on our cluster, and I note
particularly that we never manage to see a job getting assigned a start
time. It seems very possible that backfilled jobs are stealing nodes
reserved for large/higher priority jobs.


I'm wondering if our backfill configuration has any bearing on this issue
or whether we are unfortunate enough to have hit a bug. One parameter that
is missing in our bf setup is "bf_continue". Is that parameter significant
in terms of ensuring that bf drills down sufficiently in the job mix? Also
we are using the default bf frequency -- should we really reduce the
frequency and potentially reduce the number of bf jobs per group/user or
total at each iteration? Currently, I think we are setting the per/user
limit to 20.

Any thoughts would be appreciated, please.

Best regards,
David

[slurm-users] Backfill advice

Reply via email to