> Eventually the job aging makes the jobs so high-priority, Guess I should look in the manual, but could you increase the job ageing time parameters? I guess it is also worth saying that this is the scheduler doing its job - it is supposed to keep jobs ready and waiting to go, to keep the cluster busy!
I was going to suggest that you could have a cron job, which then looks at the jobs the 'queue stuffer' has and moves some of them down in priority. This is a bad suggestion - in general writing a 'scheduler within a scheduler' is not a good thing and you only end up fighting the real scheduler. I did have a similar situation on my last job - a user needed to get some work done, and submitted a huge amount of jobs. It happened to be that there was a low load on the cluster at the time, so this user got a lot of job started. We finalyl had to temporarily limit the maximum amount of jobs he could submit. Again if you think about it this is a good thing - we are operating batch queuing systems and this user was putting it to good use. The 'problem' is more related to the length of the job. If the 'queue stuffer' is submitting jobs with a long wallclock time then yes you will get complaints from the other users. With shorter jobs there is more opportunity for other users to 'get a look in' as we say in Glasgow. Actually what IS bad is users not putting cluster resources to good use. You can often see jobs which are 'stalled' - ie the nodes are reserved for the job, but the internal logic of the job has failed and the executables have not launched. Or maybe some user is running an interactive job and has wandered off for coffee/beer/an extended holiday. It is well worth scanning for stalled jobs and terminating them. On 8 May 2018 at 09:25, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > On 05/08/2018 08:44 AM, Bjørn-Helge Mevik wrote: > >> Jonathon A Anderson <jonathon.ander...@colorado.edu> writes: >> >> ## Queue stuffing >>> >> >> There is the bf_max_job_user SchedulerParameter, which is sort of the >> "poor man's MAXIJOB"; it limits the number of jobs from each user the >> backfiller will try to start on each run. It doesn't do exactly what >> you want, but at least the backfiller will not create reservations for >> _all_ the queue stuffer's jobs. >> > > Adding to this I discuss backfilling configuration in > https://wiki.fysik.dtu.dk/niflheim/Slurm_scheduler#scheduler-configuration > > The MaxJobCount limit etc. is described in > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#maxjobcount-limit > > /Ole > >