Hi David, If your maximum run-time is more than the 2 1/2 days (3600 minutes) you have set for bf_window, you might need to increase bf_window accordingly. See the description here:
https://slurm.schedmd.com/sched_config.html Cheers, Loris Baker D.J. <d.j.ba...@soton.ac.uk> writes: > Hello, > > A colleague intimated that he thought that larger jobs were tending to > get starved out on our slurm cluster. It's not a busy time at the > moment so it's difficult to test this properly. Back in November it > was not completely unusual for a larger job to have to wait up to a > week to start. > > I've extracted the key scheduling configuration out of the slurm.conf > and I would appreciate your comments, please. Even at the busiest of > times we notice many single compute jobs executing on the cluster -- > starting either via the scheduler or by backfill. > > Looking at the scheduling configuration do you think that I'm > favouring small jobs too much? That is, for example, should I increase > the PriorityWeightJobSize to encourage larger jobs to run? > > I was very keen not to starve out small/medium jobs, however perhaps > there is too much emphasis on small/medium jobs in our setup. > > My colleague is from a Moab background, and in that respect he was > surprised not to see nodes being reserved for jobs, but it could be > that Slurm works in a different way to try to make efficient use of > the cluster by backfilling more aggressively than Moab. Certainly we > see a great deal of activity from backfill. > > In this respect does anyone understand the mechanism used to reserve > nodes/resources for jobs in slurm or potentially where to look for > that type of information. > > Best regards, > David > > SchedulerType=sched/backfill > SchedulerParameters=bf_window=3600,bf_resolution=180,bf_max_job_user=4 > > SelectType=select/cons_res > SelectTypeParameters=CR_Core > FastSchedule=1 > PriorityFavorSmall=NO > PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME,FAIR_TREE > PriorityType=priority/multifactor > PriorityDecayHalfLife=14-0 > > PriorityWeightFairshare=1000000 > PriorityWeightAge=100000 > PriorityWeightPartition=0 > PriorityWeightJobSize=100000 > PriorityWeightQOS=10000 > PriorityMaxAge=7-0 > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de