Hi
We've run into similar problems with backfill (though not apparently of the
scale you've got). We have a number of users who will drop 5,000+ jobs at
once- as you've indicated, this can play havoc with backfill.
One of the newer* parameters for the backfill scheduler that's been a real
help f
Hello list,
My cluster usually has a pretty heterogenous job load and spends a lot of the
time memory bound. Ocassionally I have users that submit 100k+ short, low
resource jobs. Despite having several thousand free cores and enough RAM to
run the jobs, the backfill scheduler would never back