Question for the braintrust: I have 3 partitions:
- Partition A_highpri: 80 nodes - Partition A_lowpri: same 80 nodes - Partition B_lowpri: 10 different nodes There is no overlap between A and B partitions. Here is what I'm observing. If I fill the queue with ~20-30k jobs for partition A_highpri, and several thousand to partition A_lowpri, then, a bit later, submit jobs to partition B_lowpri, I am observing that the Partition B jobs *are queued and not running right away, and are given a pending reason of "Priority"*, which doesn't seem right to me. Yes, there are higher priority jobs pending in the queue (the jobs bound for A_hi), but there aren't any higher priority jobs pending *for the same partition* as the Partition B jobs, so theoretically, these partition B jobs should not be held up. Eventually, the scheduler gets around to scheduling them, but it seems to take a while for the scheduler (which is probably pretty busy dealing with job starts, job stops, etc) to figure this out. If I schedule fewer jobs to the A partitions ( ~3k jobs ), then the scheduler schedules the PartitionB jobs much faster, as expected. As I increase from 3k, then partition B jobs get held up longer and longer. I can raise the priority on partition B, and that does solve the problem, but I don't want those jobs to impact the partition A_lowpri jobs. In fact, *I don't want any cross-partition influence*. I'm hoping there is a slurm parameter I can tweak to make slurm recognize that these partition B jobs shouldn't ever have a pending state of "priority". Or to treat these as 2 separate queues. Or something like that. Spinning up a 2nd slurm controller is not ideal for us (uless there is a lightweight method to do it). Thanks David