Per-partition priority information will be available in Slurm version 17.11
On Tue, Aug 15, 2017, at 09:33 AM, Skouson, Gary B wrote: > I've also seen that. I'm not sure it's a "bug". It's just a result of > the current structure of the code. > > The job structure doesn't have a place to put multiple priorities, so it > seems like you end up with the priority of whatever priority was checked > last during scheduling. > > If you plow through the multifactor backfill stuff, it actually checks > the job with each of the appropriate priorities for each of the > partitions, but it seems to leave the value in the job structure of > whatever gets checked last. Since the lowest priority is checked last, > that usually ends up as the job priority in the job structure. > > There are reasons why the backfill code may not get through all the jobs, > so it's not always the case that the lowest priority is the one that > sticks in the job record, but that seems to be the usual result. > > With our configuration, I can submit a set of jobs that can't start right > away and the priorities look like: > > JOBID PARTITION USER ACCOUNT NOD ST TIME_LEFT START_TIME > SUBMIT_TIME PRIOR NODELIST(REASON) > 1878958 large,bkfi skouson mscfops 256 PD 8:00:00 N/A > 2017-08-15T08:07:41 24000 (None) > 1878959 large,bkfi skouson mscfops 256 PD 8:00:00 N/A > 2017-08-15T08:07:41 24000 (None) > 1878960 large,bkfi skouson mscfops 256 PD 8:00:00 N/A > 2017-08-15T08:07:41 24000 (None) > 1878961 large,bkfi skouson mscfops 256 PD 8:00:00 N/A > 2017-08-15T08:07:41 24000 (None) > 1878962 large,bkfi skouson mscfops 256 PD 8:00:00 N/A > 2017-08-15T08:07:41 300 (None) > 1878963 large,bkfi skouson mscfops 256 PD 8:00:00 N/A > 2017-08-15T08:07:41 300 (None) > > The large partition requires a QOS with a 4 job limit, but the bkfill > partition (with a lower priority that doesn't allow resource reservation) > can run lots of jobs. The initial squeue immediately after submission > shows the priorities above. Sometimes, the initial squeue shows some of > each priority, sometimes it's all the same priority, I'm not sure why > that is. > > Waiting for the backfill schedule to run, results in each job getting an > estimated start time, since they're in the large partition with a > priority above the bf_min_prio_reserve threshold. However the job > priority is only 304, which is the priority backfill checks last. Running > sprio also ends up with the same priority of 304 > > I think this could be fixed to list each job/partition combo with its own > priority, However, I didn't see an easy way that didn't require changes > to the job structure. > > ----- > Gary Skouson > > > -----Original Message----- > From: Corey Keasling [mailto:[email protected]] > Sent: Monday, August 14, 2017 12:16 PM > To: slurm-dev <[email protected]> > Subject: [slurm-dev] Re: Job priority calculation when submitted to > multiple partitions with different priorities > > > Once more, hello Slurm-Dev, > > The problem remains after upgrading to 17.02.6 today. A job submitted > to multiple partitions and pending for Resources has a single priority > which reflects the PriorityJobFactor of the partition that is first in > the list. Is this a bug? I spent a while digging through the bug > tracker and couldn't find anything, although changelog entries for 17.11 > might be relevant. Thoughts? > > Thank you! > > Corey > > On 08/11/2017 02:38 PM, Corey Keasling wrote: > > > > Hello again, > > > > Looks like I'll make more definite plans to upgrade. Per the Changelog > > for 17.02.3: > > > > -- Fix updating job priority on multiple partitions to be correct. > > > > Corey > > > > -- > > Corey Keasling > > Software Manager > > JILA Computing Group > > University of Colorado-Boulder > > 440 UCB Room S244 > > Boulder, CO 80309-0440 > > 303-492-9643 > > > > On 08/11/2017 01:50 PM, Corey Keasling wrote: > >> > >> Hi Slurm-Dev, > >> > >> I'm trying to determine how a job's multifactor priority is calculated > >> when the job is submitted to multiple partitions where each partition > >> has a different priority factor. I'm running 16.05.6 with ill-defined > >> plans to move to 17.02. > >> > >> My cluster is partitioned such that one partition is a subset of another > >> with the subset having a 10x higher PriorityJobFactor. The intent is to > >> give greater priority on the subset to the group that purchased it while > >> allowing all users to run on all nodes. Thus I hope to permit the > >> privileged group to submit jobs to both partitions simultaneously, but > >> to have their greater priority apply only to the subset. However, based > >> on squeue and sprio, this may not be happening. > >> > >> squeue -P reports identical priorities for both entries (i.e., the same > >> job but considered for p1 and p2). sprio seems to report the priority > >> as calculated for the first partition in the list (i.e., if submitted > >> via sbatch -p1,p2 the job has gets the p1 priority factor, while sbatch > >> -p2,p1 gives the p2 priority factor). > >> > >> So what's actually going on under the hood? Does the scheduler > >> calculate priorities for each (job,partition) pair separately, or only > >> once? > >> > >> Thank you for your help! > >> >
