Phil,
Does anyone have a working example using PreemptExemptTime?
My goal is to make a higher priority job wait 24 hours before actually
preempting a lower priority job. Another way, any job is entitled to 24
hours run time before being preempted. The preempted job should be
suspended, ideally. If requeue is necessary that is ok.
We do and it is working as expected. Please see relevant snippets below.
[~] $ scontrol show config|grep Preempt
PreemptMode = CANCEL
PreemptType = preempt/qos
PreemptExemptTime = 00:00:00
Name Priority PreemptExe
-------------------- -------- ----------
interactive 1000 01:00:00
preempt 500 01:00:00
preempt_short 500 00:30:00
rchii 1000 01:00:00
Details from my test cluster below my signature. Any ideas on what I
should check or missing? Maybe I misunderstood something.
I don't think you've missed anything. The only bit of information I can add is
that we previously were using GraceTime (which requires PreemptMode=CANCEL).
Unfortunately, depending on the application, it wouldn't always be clear that a
job was preempted in the application's output, or within the slurmctld logs.
When we switched to PreemptExemptTime, all application output and SLURM logs
stated preempted as the reason.
I know you want to suspend preempted jobs, but what happens if you cancel them
instead?
HTH,
John DeSantis
On 2/2/22 14:12, Phil Kauffman wrote:
Does anyone have a working example using PreemptExemptTime?
My goal is to make a higher priority job wait 24 hours before actually
preempting a lower priority job. Another way, any job is entitled to 24
hours run time before being preempted. The preempted job should be
suspended, ideally. If requeue is necessary that is ok.
It's been asked before here:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fslurm-users%2Fc%2FmK4_M4hpXL8%2Fm%2FsRhT53VYBQAJ&data=04%7C01%7Cdesantis%40usf.edu%7C6228bd43128249e5c34308d9e6801523%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C637794920273664110%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kjiBW8EMaKmxZPjLJJBqHvHRtTC2JluqiC%2FibtWhY3w%3D&reserved=0
I've run through many iterations attempting to set `PreemptExemptTime`
in slurm.conf and in QOS.
Setting `PreemptType=preempt/partition_prio`:
- The preempted job gets suspended but `PreemptExemptTime` is ignored.
Setting `PreemptType=preempt/qos`
- Configuring inside the QOS as well as globally in slurm.conf
- `PreemptExemptTime` is respected but both jobs continue to run at the
same time using 200% of the resources, which is not wanted.
Details from my test cluster below my signature. Any ideas on what I
should check or missing? Maybe I misunderstood something.
Cheers,
Phil
In my tests I'm using 3 mins as the PreemptExemptTime.
# Nodes
NodeName=slurm[2-5] CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=2
RealMemory=1800 MemSpecLimit=200 State=UNKNOWN
### experiment using PreemptType=preempt/qos
PartitionName=DEFAULT OverSubscribe=FORCE:1 Nodes=slurm[2-4]
PartitionName=active Default=YES QOS=normal
PartitionName=hipri Default=NO QOS=expedite
PreemptType=preempt/qos
PreemptMode=SUSPEND,GANG
PreemptExemptTime=00:03:00
SchedulerParameters=preempt_strict_order
PriorityType=priority/multifactor
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
# QOS
[root@slurm2 slurm-llnl]# sacctmgr show qos -p --noheader
normal|1|00:00:00||00:03:00|cluster|||1.000000||||||||||||||||||
expedite|2|00:00:00|normal|00:03:00|cluster|||1.000000||||||||||||||||||
### Experiment using PreemptType=preempt/partition_prio
PartitionName=low Default=NO OverSubscribe=NO PriorityTier=10
PreemptMode=requeue
PartitionName=med Default=NO OverSubscribe=FORCE:1 PriorityTier=20
PreemptMode=suspend
PartitionName=hi Default=NO OverSubscribe=FORCE:1 PriorityTier=30
PreemptMode=off
PreemptType=preempt/partition_prio
PreemptMode=SUSPEND,GANG
PreemptExemptTime=00:03:00
SchedulerParameters=preempt_strict_order
PriorityType=priority/multifactor
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the
sender and know the content is safe.