Hello,

We do have large jobs getting starved out on our cluster, and I note
particularly that we never manage to see a job getting assigned a start
time. It seems very possible that backfilled jobs are stealing nodes
reserved for large/higher priority jobs.

I'm wondering if our backfill configuration has any bearing on this issue
or whether we are unfortunate enough to have hit a bug. One parameter that
is missing in our bf setup is "bf_continue". Is that parameter significant
in terms of ensuring that bf drills down sufficiently in the job mix? Also
we are using the default bf frequency -- should we really reduce the
frequency and potentially reduce the number of bf jobs per group/user or
total at each iteration? Currently, I think we are setting the per/user
limit to 20.

Any thoughts would be appreciated, please.

Best regards,
David

Reply via email to