Re: [slurm-users] [External] Re: Preemption not working for jobs in higher priority partition

Michael Robbert Tue, 24 Aug 2021 10:27:59 -0700

I can confirm that we do preemption based on partition for one of our clusters. 
I will say that we are not using time-based partitions, ours are always up and 
they are based on group node ownership. I wonder if Slurm is refusing to 
preempt a job in a DOWN partition. Maybe try leaving the partition UP, but just 
change the priority of the partition.
One other suggestion would be to turn up the debugging on the Slurm controller 
and/or use DebugFlags. I don’t know for sure which flag would give the best 
data, but I would start with the priority flag. With the right debugging turned 
on the slurmctld.log should give you more data on how or why it is making its 
decisions.

Mike

From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Russell 
Jones <arjone...@gmail.com>
Date: Tuesday, August 24, 2021 at 10:36
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [External] Re: [slurm-users] Preemption not working for jobs in higher 
priority partition
CAUTION: This email originated from outside of the Colorado School of Mines 
organization. Do not click on links or open attachments unless you recognize 
the sender and know the content is safe.

I have been researching this further and I see other systems that appear to be 
set up the same way we are. Example: 
https://hpcrcf.atlassian.net/wiki/spaces/TCP/pages/733184001/How-to+Use+the+preempt+Partition<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhpcrcf.atlassian.net%2Fwiki%2Fspaces%2FTCP%2Fpages%2F733184001%2FHow-to%2BUse%2Bthe%2Bpreempt%2BPartition&data=04%7C01%7Cmrobbert%40mines.edu%7Ca69c3fac8af3497278b508d9671d4700%7C997209e009b346239a4d76afa44a675c%7C0%7C0%7C637654197710622568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qqslNuf2I4DxfD4%2FVB53WS0%2FfMUAbEndrsBl9Y2OaL8%3D&reserved=0>

Any further insight into what may be wrong with our setup is appreciated. I am 
not seeing what is wrong with my config, but it also isn't working anymore to 
allow preemption.

On Fri, Aug 20, 2021 at 9:46 AM Russell Jones 
<arjone...@gmail.com<mailto:arjone...@gmail.com>> wrote:
I could have swore I had tested this before implementing it and it worked as 
expected.

If I am dreaming that testing - is there a way of allowing preemption across 
partitions?

On Fri, Aug 20, 2021 at 8:40 AM Brian Andrus 
<toomuc...@gmail.com<mailto:toomuc...@gmail.com>> wrote:

IIRC, Preemption is determined by partition first, not node.

Since your pending job is in the 'day' partition, it will not preempt something 
in the 'night' partition (even if the node is in both).

Brian Andrus
On 8/19/2021 2:49 PM, Russell Jones wrote:
Hi all,

I could use some help to understand why preemption is not working for me 
properly. I have a job blocking other jobs that doesn't make sense to me. Any 
assistance is appreciated, thank you!

I have two partitions defined in slurm, a day time and a night time pariition:
Day partition - PriorityTier of 5, always Up. Limited resources under this QOS.
Night partition - PriorityTier of 5 during night time, during day time set to 
Down and PriorityTier changed to 1. Jobs can be submitted to night queue for an 
unlimited QOS as long as resources are available.

The thought here is jobs can continue to run in the night partition, even 
during the day time, until resources are requested from the day partition. Jobs 
would then be requeued/canceled in the night partition to satisfy those 
requirements.

Current output of "scontrol show part" :
PartitionName=day
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=part_day
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO 
MaxCPUsPerNode=UNLIMITED
   Nodes=cluster-r1n[01-13],cluster-r2n[01-08]
   PriorityJobFactor=1 PriorityTier=5 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=REQUEUE
   State=UP TotalCPUs=336 TotalNodes=21 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

PartitionName=night
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=part_night
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=22 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=cluster-r1n[01-13],cluster-r2n[01-08]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=REQUEUE
   State=DOWN TotalCPUs=336 TotalNodes=21 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

I currently have a job in the night partition that is blocking jobs in the day 
partition, even though the day partition has a PriorityTier of 5, and night 
partition is Down with a PriorityTier of 1.

My current slurm.conf preemption settings are:
PreemptMode=REQUEUE
PreemptType=preempt/partition_prio

The blocking job's scontrol show job output is:
JobId=105713 JobName=jobname
   Priority=1986 Nice=0 Account=xxx QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=17:49:39 TimeLimit=7-00:00:00 TimeMin=N/A
   SubmitTime=2021-08-18T22:36:36 EligibleTime=2021-08-18T22:36:36
   AccrueTime=2021-08-18T22:36:36
   StartTime=2021-08-18T22:36:39 EndTime=2021-08-25T22:36:39 Deadline=N/A
   PreemptEligibleTime=2021-08-18T22:36:39 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-18T22:36:39
   Partition=night AllocNode:Sid=cluster-1:1341505
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=cluster-r1n[12-13],cluster-r2n[04-06]
   BatchHost=cluster-r1n12
   NumNodes=5 NumCPUs=80 NumTasks=5 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=80,node=5,billing=80,gres/gpu=20
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)

The job that is being blocked:
JobId=105876 JobName=bash
   Priority=2103 Nice=0 Account=xxx QOS=normal
   JobState=PENDING 
Reason=Nodes_required_for_job_are_DOWN,_DRAINED_or_reserved_for_jobs_in_higher_priority_partitions
 Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2021-08-19T16:19:23 EligibleTime=2021-08-19T16:19:23
   AccrueTime=2021-08-19T16:19:23
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-19T16:26:43
   Partition=day AllocNode:Sid=cluster-1:2776451
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=3 NumCPUs=40 NumTasks=40 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=40,node=1,billing=40
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)

Why is the day job not preempting the night job?

Re: [slurm-users] [External] Re: Preemption not working for jobs in higher priority partition

Reply via email to