IMO the recommended method does not work well for jobs that already have a 
starttime in the future,and does not change the reason to something that 
explicitly lets you know the starttime was changed to put the job on hold; so 
it is problematic to identify jobs and release them as the starttime might have 
been set for other reasons. So a "magic number" starttime that is easy to 
identify and not likely to have been an actual value would be useful, instead 
of something like "now+duration", or additionally setting a comment field 
indicating the job is being held would help.

I have not used the Priority attribute all that much yet. Is it a bug that 
releasing a job makes the Priority very high? Do other installations see that 
behavior? I see several mentions of users only being able to reduce the 
Priority of their jobs.

Sent with [Proton Mail](https://proton.me/) secure email.

On Saturday, February 24th, 2024 at 9:44 PM, urbanjost via slurm-users 
<slurm-users@lists.schedmd.com> wrote:

> There are scontrol subcommands uhold/hold/release/requeuehold that are 
> ignored when describing how to place a job on hold in FAQ 21; and it is never 
> explained why the method described therein is the best method, it just states 
> it is. Does anyone know why the FAQ method is better than using the 
> subcommands? Is it because the PRIORITY and/or NICE values are not altered 
> (maybe)? The question is also about Running but the answer is just
> about Starting and not Suspending which is not quite as clear (I think 
> "running" should be "starting" to make that clear; and/or how to suspend 
> should be described as well).
>
> If the answer is not clear to anyone, I might turn this into a request for 
> clarification in the
> Slurm bugzilla as a documentation change request but wanted to see if this 
> was already clear to anyone and I am missing something.
>
> From FAQ:
>
> 21. How can I temporarily prevent a job from running (e.g. place it into a 
> hold state)?
>
> The easiest way to do this is to change a job's earliest begin time
> (optionally set at job submit time using the --begin option). The example
> below places a job into hold state (preventing its initiation for 30 days)
> and later permitting it to start now.
>
> <METHOD I>
> $ scontrol update JobId=1234 StartTime=now+30days
> ... later ...
> $ scontrol update JobId=1234 StartTime=now
>
> Note: Empirically in METHOD I the JobId can be a <job_list> , which I
> initially thought required single JobIDs.
>
> No explanation is given on why METHOD I is best; and there are other methods
> that seem more intuitive. I wonder what is
> undesirable about the following method which I have been using -- using the 
> scontrol(1) subcommands hold/uhold/release/requeuehold.
>
> <METHOD II>
> $ scontrol hold <job_list> # advantage to administrator as user cannot change
> $ scontrol uhold <job_list>
> $ scontrol release <job_list>
>
> Examples:
> $ scontrol uhold jobname=JOB_NAME
> $ scontrol uhold '[100-200],300,500'
>
> Using uhold the "Reason" changes to something easily identifying the
> job is being held, as "Reason=None" became "Reason=JobHeldUser which
> seems better that Method I in that regard.
>
> The downside might be PRIORITY changed to zero and then went to a
> very large value when released?
>
> Another method appears to be that setting PRIORITY to zero also
> places jobs in hold.
>
> <METHOD III>
> $ scontrol update jobid=373 Priority=0
> $ scontrol release jobid=373 # sets to a very high value
> $ scontrol update jobid=373 Priority=11111 # put back to lower desired value
>
> Once lowered, does an optional setting prevent a user from raising PRIORITY(?)
> The manual says
>
> Only the Slurm administrator or root can increase job's priority.
>
> At least on my machine the "release" buts the priority to a very high value, 
> and a regular user can lower the value back to the (probably) lower original 
> value.
>
> I did not see it happening but there are some statements in the documentation 
> that make me think not only PRIORITY but perhaps the NICE value might be 
> changed by METHOD II and METHOD III, although I could not get the NICE value 
> to be inadvertently changed.
>
> Sent with [Proton Mail](https://proton.me/) secure email.
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to