Hello there, I am having problems understanding the slurm scheduler, with regard to the "nice" parameter.
I have two types of job: one is low priority and uses 4 CPUs (--nice=20), the other one is high priority and uses 24 CPUs (--nice=10). When I submit, let's say, 50 low-priority jobs, only 6 are executed - this is fine since a job uses 4 CPUs and the node has 24. However, when I submit my high priority job that must use 24 CPUs, things get strange. What I was expecting: - slurm would have stopped starting low-priority queued jobs (switching from PD -> R) - waited to have 24 CPUs free (in this case, to have no running jobs) - run the high priority job - when the job has completed, start the low priority jobs as usual What I instead observed: - slurm keep starting queue job like I didn't specified a nice parameter. (partial) slurm config: SwitchType=switch/none TaskPlugin=task/none FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory NodeName=node01 CPUs=24 RealMemory=120000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Low priority job: #SBATCH --job-name=task4 #SBATCH --ntasks=4 #SBATCH --mem=1gb #SBATCH --time=10:00:00 #SBATCH --output=%j.out #SBATCH --error=%j.err #SBATCH --partition=ogre #SBATCH --account=ogre #SBATCH --nice=20 High priority job: #SBATCH --job-name=task24 #SBATCH --ntasks=24 #SBATCH --mem=1gb #SBATCH --time=10:00:00 #SBATCH --output=%j.out #SBATCH --error=%j.err #SBATCH --partition=ogre #SBATCH --account=ogre #SBATCH --nice=10 Do you have any idea of what I am missing? Thanks a lot. Matteo