Hello, We do have large jobs getting starved out on our cluster, and I note particularly that we never manage to see a job getting assigned a start time. It seems very possible that backfilled jobs are stealing nodes reserved for large/higher priority jobs.
I'm wondering if our backfill configuration has any bearing on this issue or whether we are unfortunate enough to have hit a bug. One parameter that is missing in our bf setup is "bf_continue". Is that parameter significant in terms of ensuring that bf drills down sufficiently in the job mix? Also we are using the default bf frequency -- should we really reduce the frequency and potentially reduce the number of bf jobs per group/user or total at each iteration? Currently, I think we are setting the per/user limit to 20. Any thoughts would be appreciated, please. Best regards, David