We've been using a backfill priority partition for people doing HTC
work. We have requeue set so that jobs from the high priority
partitions can take over.
You can do this for your interactive nodes as well if you want. We
dedicate hardware to interactive work and use Partition based QoS's to
That’s the first limit I placed on our cluster, and it has generally worked out
well (never used a job limit). A single account can get 1000 CPU-days in
whatever distribution they want. I’ve just added a root-only ‘expedited’ QOS
for times when the cluster is mostly idle, but a few users have jo
On 05/08/2018 09:49 AM, John Hearns wrote:
Actually what IS bad is users not putting cluster resources to good use.
You can often see jobs which are 'stalled' - ie the nodes are reserved
for the job,
but the internal logic of the job has failed and the executables have
not launched. Or maybe s
"Otherwise a user can have a sing le job that takes the entire cluster,
and insidesplit it up the way he wants to."
Yair, I agree. That is what I was referring to regardign interactive jobs.
Perhaps not a user reserving the entire cluster,
but a use reserving a lot of compute nodes and not making s
> Eventually the job aging makes the jobs so high-priority,
Guess I should look in the manual, but could you increase the job ageing
time parameters?
I guess it is also worth saying that this is the scheduler doing its job -
it is supposed to keep jobs ready and waiting to go, to keep the cluster
Hi,
This is what we did, not sure those are the best solutions :)
## Queue stuffing
We have set PriorityWeightAge several magnitudes lower than
PriorityWeightFairshare, and we also have PriorityMaxAge set to cap of
older jobs. As I see it, the fairshare is far more important than age.
Besides t
On 05/08/2018 08:44 AM, Bjørn-Helge Mevik wrote:
Jonathon A Anderson writes:
## Queue stuffing
There is the bf_max_job_user SchedulerParameter, which is sort of the
"poor man's MAXIJOB"; it limits the number of jobs from each user the
backfiller will try to start on each run. It doesn't do
Jonathon A Anderson writes:
> ## Queue stuffing
There is the bf_max_job_user SchedulerParameter, which is sort of the
"poor man's MAXIJOB"; it limits the number of jobs from each user the
backfiller will try to start on each run. It doesn't do exactly what
you want, but at least the backfiller
One of these TRES-related ones in a QOS ought to do it:
https://slurm.schedmd.com/resource_limits.html
Your problem there, though, is you will eventually have stuff waiting to run it
and when the system is idle. We had the same circumstance and the same eventual
outcome.
--
|| \\UTGERS,
We have two main issues with our scheduling policy right now. The first is an
issue that we call "queue stuffing." The second is an issue with interactive
job availability. We aren't confused about why these issues exist, but we
aren't sure the best way to address them.
I'd love to hear any sug
10 matches
Mail list logo