I don't really have an answer for you other than a "hallway comment", that it sounds like a good thing which I would test with a simulator, if I had one. I've been intrigued by (but really not looked much into) https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf
On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob <rug...@psu.edu> wrote: > On our system, for some partitions, we guarantee that a job can run at > least an hour before being preempted by a higher priority job. We use the > QOS preempt exempt time for this, and it appears to be working. But of > course, I want to TEST that it works. > > So on a test system, I start a lower priority job on a specific node, wait > until it starts running, and then I start a higher priority job for the > same node. The test should only pass if the higher priority job has an > OPPORTUNITY to preempt the lower priority job, and doesn't. > > Now, I know I can get a preempt eligible time out of scontrol for the > lower priority job and verify that it's set for an hour (I do check that > already), but that's not good enough for me. I could obviously let the > test run for an hour to verify the lower priority job was never > preempted...but that's not really feasible. So instead, I want to verify > that the higher priority job has had a chance to preempt the lower priority > job, and it did not. > > So far, the way I've been doing that is to check the reported Scheduler in > the scontrol job output for the higher priority job. I figure that when > the scheduler changes to Backfill instead of Main, then the higher priority > job has been seen by the main scheduler and it passed on the chance to > preempt the lower priority job. > > Is that a good assumption? Is there any other, or potentially quicker, > way to verify that the higher priority job will NOT preempt the lower > priority job? > > Rob >