[slurm-dev] Re: Job priority calculation when submitted to multiple partitions with different priorities

Moe Jette Tue, 15 Aug 2017 09:31:31 -0700

Per-partition priority information will be available in Slurm version
17.11


On Tue, Aug 15, 2017, at 09:33 AM, Skouson, Gary B wrote:
> I've also seen that.  I'm not sure it's a "bug".  It's just a result of
> the current structure of the code.
>  
> The job structure doesn't have a place to put multiple priorities, so it
> seems like you end up with the priority of whatever priority was checked
> last during scheduling.
> 
> If you plow through the multifactor backfill stuff, it actually checks
> the job with each of the appropriate priorities for each of the
> partitions, but it seems to leave the value in the job structure of
> whatever gets checked last.  Since the lowest priority is checked last,
> that usually ends up as the job priority in the job structure.
> 
> There are reasons why the backfill code may not get through all the jobs,
> so it's not always the case that the lowest priority is the one that
> sticks in the job record, but that seems to be the usual result.
> 
> With our configuration, I can submit a set of jobs that can't start right
> away and the priorities look like:
> 
> JOBID   PARTITION  USER     ACCOUNT    NOD ST  TIME_LEFT START_TIME      
>    SUBMIT_TIME         PRIOR NODELIST(REASON)
> 1878958 large,bkfi skouson  mscfops    256 PD    8:00:00 N/A             
>    2017-08-15T08:07:41 24000 (None)
> 1878959 large,bkfi skouson  mscfops    256 PD    8:00:00 N/A             
>    2017-08-15T08:07:41 24000 (None)
> 1878960 large,bkfi skouson  mscfops    256 PD    8:00:00 N/A             
>    2017-08-15T08:07:41 24000 (None)
> 1878961 large,bkfi skouson  mscfops    256 PD    8:00:00 N/A             
>    2017-08-15T08:07:41 24000 (None)
> 1878962 large,bkfi skouson  mscfops    256 PD    8:00:00 N/A             
>    2017-08-15T08:07:41 300   (None)
> 1878963 large,bkfi skouson  mscfops    256 PD    8:00:00 N/A             
>    2017-08-15T08:07:41 300   (None)
> 
> The large partition requires a QOS with a 4 job limit, but the bkfill
> partition (with a lower priority that doesn't allow resource reservation)
> can run lots of jobs.  The initial squeue immediately after submission
> shows the priorities above.  Sometimes, the initial squeue shows some of
> each priority, sometimes it's all the same  priority, I'm not sure why
> that is.  
> 
> Waiting for the backfill schedule to run, results in each job getting an
> estimated start time, since they're in the large partition with a
> priority above the bf_min_prio_reserve threshold.   However the job
> priority is only 304, which is the priority backfill checks last. Running
> sprio also ends up with the same priority of 304
>  
> I think this could be fixed to list each job/partition combo with its own
> priority, However, I didn't see an easy way that didn't require changes
> to the job structure.
> 
> -----
> Gary Skouson
> 
> 
> -----Original Message-----
> From: Corey Keasling [mailto:[email protected]] 
> Sent: Monday, August 14, 2017 12:16 PM
> To: slurm-dev <[email protected]>
> Subject: [slurm-dev] Re: Job priority calculation when submitted to
> multiple partitions with different priorities
> 
> 
> Once more, hello Slurm-Dev,
> 
> The problem remains after upgrading to 17.02.6 today.  A job submitted 
> to multiple partitions and pending for Resources has a single priority 
> which reflects the PriorityJobFactor of the partition that is first in 
> the list.  Is this a bug?  I spent a while digging through the bug 
> tracker and couldn't find anything, although changelog entries for 17.11 
> might be relevant.  Thoughts?
> 
> Thank you!
> 
> Corey
> 
> On 08/11/2017 02:38 PM, Corey Keasling wrote:
> >
> > Hello again,
> >
> > Looks like I'll make more definite plans to upgrade.  Per the Changelog
> > for 17.02.3:
> >
> >  -- Fix updating job priority on multiple partitions to be correct.
> >
> > Corey
> >
> > --
> > Corey Keasling
> > Software Manager
> > JILA Computing Group
> > University of Colorado-Boulder
> > 440 UCB Room S244
> > Boulder, CO 80309-0440
> > 303-492-9643
> >
> > On 08/11/2017 01:50 PM, Corey Keasling wrote:
> >>
> >> Hi Slurm-Dev,
> >>
> >> I'm trying to determine how a job's multifactor priority is calculated
> >> when the job is submitted to multiple partitions where each partition
> >> has a different priority factor.  I'm running 16.05.6 with ill-defined
> >> plans to move to 17.02.
> >>
> >> My cluster is partitioned such that one partition is a subset of another
> >> with the subset having a 10x higher PriorityJobFactor.  The intent is to
> >> give greater priority on the subset to the group that purchased it while
> >> allowing all users to run on all nodes.  Thus I hope to permit the
> >> privileged group to submit jobs to both partitions simultaneously, but
> >> to have their greater priority apply only to the subset.  However, based
> >> on squeue and sprio, this may not be happening.
> >>
> >> squeue -P reports identical priorities for both entries (i.e., the same
> >> job but considered for p1 and p2).  sprio seems to report the priority
> >> as calculated for the first partition in the list (i.e., if submitted
> >> via sbatch -p1,p2 the job has gets the p1 priority factor, while sbatch
> >> -p2,p1 gives the p2 priority factor).
> >>
> >> So what's actually going on under the hood?  Does the scheduler
> >> calculate priorities for each (job,partition) pair separately, or only
> >> once?
> >>
> >> Thank you for your help!
> >>
>

[slurm-dev] Re: Job priority calculation when submitted to multiple partitions with different priorities

Reply via email to