IMO the recommended method does not work well for jobs that already have a starttime in the future,and does not change the reason to something that explicitly lets you know the starttime was changed to put the job on hold; so it is problematic to identify jobs and release them as the starttime might have been set for other reasons. So a "magic number" starttime that is easy to identify and not likely to have been an actual value would be useful, instead of something like "now+duration", or additionally setting a comment field indicating the job is being held would help.
I have not used the Priority attribute all that much yet. Is it a bug that releasing a job makes the Priority very high? Do other installations see that behavior? I see several mentions of users only being able to reduce the Priority of their jobs. Sent with [Proton Mail](https://proton.me/) secure email. On Saturday, February 24th, 2024 at 9:44 PM, urbanjost via slurm-users <slurm-users@lists.schedmd.com> wrote: > There are scontrol subcommands uhold/hold/release/requeuehold that are > ignored when describing how to place a job on hold in FAQ 21; and it is never > explained why the method described therein is the best method, it just states > it is. Does anyone know why the FAQ method is better than using the > subcommands? Is it because the PRIORITY and/or NICE values are not altered > (maybe)? The question is also about Running but the answer is just > about Starting and not Suspending which is not quite as clear (I think > "running" should be "starting" to make that clear; and/or how to suspend > should be described as well). > > If the answer is not clear to anyone, I might turn this into a request for > clarification in the > Slurm bugzilla as a documentation change request but wanted to see if this > was already clear to anyone and I am missing something. > > From FAQ: > > 21. How can I temporarily prevent a job from running (e.g. place it into a > hold state)? > > The easiest way to do this is to change a job's earliest begin time > (optionally set at job submit time using the --begin option). The example > below places a job into hold state (preventing its initiation for 30 days) > and later permitting it to start now. > > <METHOD I> > $ scontrol update JobId=1234 StartTime=now+30days > ... later ... > $ scontrol update JobId=1234 StartTime=now > > Note: Empirically in METHOD I the JobId can be a <job_list> , which I > initially thought required single JobIDs. > > No explanation is given on why METHOD I is best; and there are other methods > that seem more intuitive. I wonder what is > undesirable about the following method which I have been using -- using the > scontrol(1) subcommands hold/uhold/release/requeuehold. > > <METHOD II> > $ scontrol hold <job_list> # advantage to administrator as user cannot change > $ scontrol uhold <job_list> > $ scontrol release <job_list> > > Examples: > $ scontrol uhold jobname=JOB_NAME > $ scontrol uhold '[100-200],300,500' > > Using uhold the "Reason" changes to something easily identifying the > job is being held, as "Reason=None" became "Reason=JobHeldUser which > seems better that Method I in that regard. > > The downside might be PRIORITY changed to zero and then went to a > very large value when released? > > Another method appears to be that setting PRIORITY to zero also > places jobs in hold. > > <METHOD III> > $ scontrol update jobid=373 Priority=0 > $ scontrol release jobid=373 # sets to a very high value > $ scontrol update jobid=373 Priority=11111 # put back to lower desired value > > Once lowered, does an optional setting prevent a user from raising PRIORITY(?) > The manual says > > Only the Slurm administrator or root can increase job's priority. > > At least on my machine the "release" buts the priority to a very high value, > and a regular user can lower the value back to the (probably) lower original > value. > > I did not see it happening but there are some statements in the documentation > that make me think not only PRIORITY but perhaps the NICE value might be > changed by METHOD II and METHOD III, although I could not get the NICE value > to be inadvertently changed. > > Sent with [Proton Mail](https://proton.me/) secure email.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com