Along those lines, there is the slurm.conf setting for _JobRequeue_ which controls the default behavior for jobs' ability to be re-queued.
- Michael On Fri, Mar 1, 2019 at 7:07 AM Thomas M. Payerle <paye...@umd.edu> wrote: > My understanding is that with PreemptMode=requeue, the running scavenger > job processes on the node will be killed, but the job will be placed back > int he queue (assuming the job's specific parameters allow this. A job can > have a --no-requeue flag set, in which case I assume it behaves the same as > PreemptMode=cancel). > > When a job which has been requeued starts up a second (or Nth time), I > believe Slurm basically just reruns the job script. If the job did not do > any checkpointing, this means the job starts from the very beginning. If > the job does checkpointing in some fashion, then depending on how the > checkpointing was implemented and the cluster environment, the script might > or might not have to check for the existence of checkpointing data in order > to resume at the last checkpoint. > > On Fri, Mar 1, 2019 at 7:24 AM david baker <djbake...@gmail.com> wrote: > >> Hello, >> >> Following up on implementing preemption in Slurm. Thank you again for all >> the advice. After a short break I've been able to run some basic >> experiments. Initially, I have kept things very simple and made the >> following changes in my slurm.conf... >> >> # Premption settings >> PreemptType=preempt/partition_prio >> PreemptMode=requeue >> >> PartitionName=relgroup nodes=red[465-470] ExclusiveUser=YES >> MaxCPUsPerNode=40 DefaultTime=02:00:00 MaxTime=60:00:00 QOS=relgroup >> State=UP AllowAccounts=relgroup Priority=10 PreemptMode=off >> >> # Scavenger partition >> PartitionName=scavenger nodes=red[465-470] ExclusiveUser=YES >> MaxCPUsPerNode=40 DefaultTime=00:15:00 MaxTime=02:00:00 QOS=scavenger >> State=UP AllowGroups=jfAccessToIridis5 PreemptMode=requeue >> >> The nodes in the relgroup queue are owned by the General Relativity group >> and, of course, they have priority to these nodes. The general population >> can scavenge these nodes via the scavenger queue. When I use >> "preemptmode=cancel" I'm happy that the relgroup jobs can preempt the >> scavenger jobs (and the scavenger jobs are cancelled). When I set the >> preempt mode to "requeue" I see that the scavenger jobs are still >> cancelled/killed. Have I missed an important configuration change or is it >> that lower priority jobs will always be killed and not re-queued? >> >> Could someone please advise me on this issue? Also I'm wondering if I >> really understand the "requeue" option. Does that mean re-queued and run >> from the beginning or run from the current state (needing check pointing)? >> >> Best regards, >> David >> >> On Tue, Feb 19, 2019 at 2:15 PM Prentice Bisbal <pbis...@pppl.gov> wrote: >> >>> I just set this up a couple of weeks ago myself. Creating two partitions >>> is definitely the way to go. I created one partition, "general" for normal, >>> general-access jobs, and another, "interruptible" for general-access jobs >>> that can be interrupted, and then set PriorityTier accordingly in my >>> slurm.conf file (Node names omitted for clarity/brevity). >>> >>> PartitionName=general Nodes=... MaxTime=48:00:00 State=Up >>> PriorityTier=10 QOS=general >>> PartitionName=interruptible Nodes=... MaxTime=48:00:00 State=Up >>> PriorityTier=1 QOS=interruptible >>> >>> I then set PreemptMode=Requeue, because I'd rather have jobs requeued >>> than suspended. And it's been working great. There are few other settings I >>> had to change. The best documentation for all the settings you need to >>> change is https://slurm.schedmd.com/preempt.html >>> >>> Everything has been working exactly as desired and advertised. My users >>> who needed the ability to run low-priority, long-running jobs are very >>> happy. >>> >>> The one caveat is that jobs that will be killed and requeued need to >>> support checkpoint/restart. So when this becomes a production thing, users >>> are going to have to acknowledge that they will only use this partition for >>> jobs that have some sort of checkpoint/restart capability. >>> >>> Prentice >>> >>> On 2/15/19 11:56 AM, david baker wrote: >>> >>> Hi Paul, Marcus, >>> >>> Thank you for your replies. Using partition priority all makes sense. I >>> was thinking of doing something similar with a set of nodes purchased by >>> another group. That is, having a private high priority partition and a >>> lower priority "scavenger" partition for the public. In this case scavenger >>> jobs will get killed when preempted. >>> >>> In the present case , I did wonder if it would be possible to do >>> something with just a single partition -- hence my question.Your replies >>> have convinced me that two partitions will work -- with preemption leading >>> to re-queued jobs. >>> >>> Best regards, >>> David >>> >>> On Fri, Feb 15, 2019 at 3:09 PM Paul Edmon <ped...@cfa.harvard.edu> >>> wrote: >>> >>>> Yup, PriorityTier is what we use to do exactly that here. That said >>>> unless you turn on preemption jobs may still pend if there is no space. We >>>> run with REQUEUE on which has worked well. >>>> >>>> >>>> -Paul Edmon- >>>> >>>> >>>> On 2/15/19 7:19 AM, Marcus Wagner wrote: >>>> >>>> Hi David, >>>> >>>> as far as I know, you can use the PriorityTier (partition parameter) to >>>> achieve this. According to the manpages (if I remember right) jobs from >>>> higher priority tier partitions have precedence over jobs from lower >>>> priority tier partitions, without taking the normal fairshare priority into >>>> consideration. >>>> >>>> Best >>>> Marcus >>>> >>>> On 2/15/19 10:07 AM, David Baker wrote: >>>> >>>> Hello. >>>> >>>> >>>> We have a small set of compute nodes owned by a group. The group has >>>> agreed that the rest of the HPC community can use these nodes providing >>>> that they (the owners) can always have priority access to the nodes. The >>>> four nodes are well provisioned (1 TByte memory each plus 2 GRID K2 >>>> graphics cards) and so there is no need to worry about preemption. In fact >>>> I'm happy for the nodes to be used as well as possible by all users. It's >>>> just that jobs from the owners must take priority if resources are scarce. >>>> >>>> >>>> What is the best way to achieve the above in slurm? I'm planning to >>>> place the nodes in their own partition. The node owners will have priority >>>> access to the nodes in that partition, but will have no advantage when >>>> submitting jobs to the public resources. Does anyone please have any ideas >>>> how to deal with this? >>>> >>>> >>>> Best regards, >>>> >>>> David >>>> >>>> >>>> >>>> -- >>>> Marcus Wagner, Dipl.-Inf. >>>> >>>> IT Center >>>> Abteilung: Systeme und Betrieb >>>> RWTH Aachen University >>>> Seffenter Weg 23 >>>> 52074 Aachen >>>> Tel: +49 241 80-24383 >>>> Fax: +49 241 80-624383wag...@itc.rwth-aachen.dewww.itc.rwth-aachen.de >>>> >>>> > > -- > Tom Payerle > DIT-ACIGS/Mid-Atlantic Crossroads paye...@umd.edu > 5825 University Research Park (301) 405-6135 > University of Maryland > College Park, MD 20740-3831 >