I could have swore I had tested this before implementing it and it worked as expected.
If I am dreaming that testing - is there a way of allowing preemption across partitions? On Fri, Aug 20, 2021 at 8:40 AM Brian Andrus <toomuc...@gmail.com> wrote: > IIRC, Preemption is determined by partition first, not node. > > Since your pending job is in the 'day' partition, it will not preempt > something in the 'night' partition (even if the node is in both). > > Brian Andrus > On 8/19/2021 2:49 PM, Russell Jones wrote: > > Hi all, > > I could use some help to understand why preemption is not working for me > properly. I have a job blocking other jobs that doesn't make sense to me. > Any assistance is appreciated, thank you! > > > I have two partitions defined in slurm, a day time and a night time > pariition: > > Day partition - PriorityTier of 5, always Up. Limited resources under this > QOS. > Night partition - PriorityTier of 5 during night time, during day time set > to Down and PriorityTier changed to 1. Jobs can be submitted to night queue > for an unlimited QOS as long as resources are available. > > The thought here is jobs can continue to run in the night partition, even > during the day time, until resources are requested from the day partition. > Jobs would then be requeued/canceled in the night partition to > satisfy those requirements. > > > > Current output of "scontrol show part" : > > PartitionName=day > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=NO QoS=part_day > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO > MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=0 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=cluster-r1n[01-13],cluster-r2n[01-08] > PriorityJobFactor=1 PriorityTier=5 RootOnly=NO ReqResv=NO > OverSubscribe=NO > OverTimeLimit=NONE PreemptMode=REQUEUE > State=UP TotalCPUs=336 TotalNodes=21 SelectTypeParameters=NONE > JobDefaults=(null) > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > > PartitionName=night > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=NO QoS=part_night > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > Hidden=NO > MaxNodes=22 MaxTime=7-00:00:00 MinNodes=0 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=cluster-r1n[01-13],cluster-r2n[01-08] > PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO > OverSubscribe=NO > OverTimeLimit=NONE PreemptMode=REQUEUE > State=DOWN TotalCPUs=336 TotalNodes=21 SelectTypeParameters=NONE > JobDefaults=(null) > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > > > > I currently have a job in the night partition that is blocking jobs in the > day partition, even though the day partition has a PriorityTier of 5, and > night partition is Down with a PriorityTier of 1. > > My current slurm.conf preemption settings are: > > PreemptMode=REQUEUE > PreemptType=preempt/partition_prio > > > > The blocking job's scontrol show job output is: > > JobId=105713 JobName=jobname > Priority=1986 Nice=0 Account=xxx QOS=normal > JobState=RUNNING Reason=None Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 > RunTime=17:49:39 TimeLimit=7-00:00:00 TimeMin=N/A > SubmitTime=2021-08-18T22:36:36 EligibleTime=2021-08-18T22:36:36 > AccrueTime=2021-08-18T22:36:36 > StartTime=2021-08-18T22:36:39 EndTime=2021-08-25T22:36:39 Deadline=N/A > PreemptEligibleTime=2021-08-18T22:36:39 PreemptTime=None > SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-18T22:36:39 > Partition=night AllocNode:Sid=cluster-1:1341505 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=cluster-r1n[12-13],cluster-r2n[04-06] > BatchHost=cluster-r1n12 > NumNodes=5 NumCPUs=80 NumTasks=5 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > TRES=cpu=80,node=5,billing=80,gres/gpu=20 > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null) > > > > The job that is being blocked: > > JobId=105876 JobName=bash > Priority=2103 Nice=0 Account=xxx QOS=normal > JobState=PENDING > Reason=Nodes_required_for_job_are_DOWN,_DRAINED_or_reserved_for_jobs_in_higher_priority_partitions > Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 > RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A > SubmitTime=2021-08-19T16:19:23 EligibleTime=2021-08-19T16:19:23 > AccrueTime=2021-08-19T16:19:23 > StartTime=Unknown EndTime=Unknown Deadline=N/A > SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-19T16:26:43 > Partition=day AllocNode:Sid=cluster-1:2776451 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=(null) > NumNodes=3 NumCPUs=40 NumTasks=40 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > TRES=cpu=40,node=1,billing=40 > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null) > > > > Why is the day job not preempting the night job? > >