[slurm-users] Re: Implementing a "soft" wall clock limit

Michael DiDomenico via slurm-users Tue, 17 Jun 2025 06:19:43 -0700

there is one reason to have a hard time limit.  garbage collection.
if your institution is like mine, visiting academics often come and go
and many full-timers are "forgetful" of what they startup.  at some
point someone has to clean all that up


On Tue, Jun 17, 2025 at 8:18 AM Davide DelVento via slurm-users
<slurm-users@lists.schedmd.com> wrote:
>
> This conversation is drifting a bit away from my initial questions and 
> covering various other related topics. In fact I do agree with almost 
> everything written in the last few messages. However, that is somewhat 
> orthogonal to my initial request, which I now understand has the answer "not 
> possible with slurm configuration, possible with ugly hacks which are 
> probably error prone and not worth the hassle". Just for the sake of the 
> discussion (since I'm enjoying hearing the various perspectives) I'll restate 
> my request and why I think slurm does not support this need.
>
> Most clusters have very high utilization all the time. This is good for ROI 
> etc but annoying to users. Forcing users to specify a firm wallclock limit 
> helps slurm make good scheduling decisions, which keep utilization (ROI, etc) 
> high and minimizes wait time for everybody.
>
> At the place where I work there is a quite different situation: there are 
> moments of high pressure and long wait, and there are moments in which its 
> utilization drops under 50% and sometimes even under 25% (e.g. during long 
> weekends). We can have a discussion about it, but the bottom line is that 
> management (ROI, etc) is fine with it, so that's the way it is. This 
> circumstance, I agree, is quite peculiar and not shared by any other place I 
> worked before or where I ever had an account and saw how things were, but 
> that is what it is. In this circumstance it feels at least silly and perhaps 
> even extremely wasteful and annoying to let slurm cancel jobs at their 
> wallclock limit without considering other context. I mean, imagine a user 
> with a weeklong job who estimated a 7 day wallclock limit and "for good 
> measure" requested 8 days, but then the job would actually take 9 days. 
> Imagine that the 8th day happened in the middle of on a long weekend when 
> utilization was 25% and there was not a single other job pending. Maybe this 
> job is a one-off experiment quickly cobbled together to test one thing, so 
> it's not a well-designed piece of code and does not have checkpoint-restart 
> capabilities. Why enforce the wallclock limit in that situation?
>
> The way around this problem in the past was to simply not make the wallclock 
> limit mandatory (which was decided by my predecessor, who has now left). That 
> worked, only because the cluster was not in a very good usability status so 
> most people avoided it anyway and there seldom was a long line of jobs 
> pending in the queue, so slurm did not need to work very hard to schedule 
> things. Now that I've improved the usability situation, this has become a 
> problem, because utilization has become much higher. Perhaps in a short time 
> people will learn to plan ahead and submit more jobs and fill the machine up 
> during the weekends too (I'm working on user education towards that), and if 
> that happens, it will make the above dilemma go away. But for now I have it.
>
> I'm still mulling on how to best proceed. Maybe just force the users to set a 
> wallclock limit and live with it.
>
> Here is another idea that just came to me. Does slurm have a "global" switch 
> to turn on/off cancelling jobs hitting their wallclock limit? If so, I could 
> have a cron job checking if there are pending jobs in the queue and if not 
> shut it off, and if so turn it on. Granted, that may be sloppy (e.g. one job 
> pending for one resources causing the cancelling of jobs using other 
> resources) but it's something and it'd be easy to implement compared to the 
> turn on/off pre-emption as discussed in a previous message.
>
> Great conversation folks, enjoying reading the various perspectives at 
> different sites!
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Jun 17, 2025 at 12:26 AM Loris Bennett via slurm-users 
> <slurm-users@lists.schedmd.com> wrote:
>>
>> Hi Prentice,
>>
>> Prentice Bisbal via slurm-users
>>
>> <slurm-users@lists.schedmd.com> writes:
>>
>> > I think the idea of having a generous default timelimit is the wrong way 
>> > to go. In fact, I think any defaults for jobs are a bad way to go.  The 
>> > majority of your
>> > users will just use that default time limit, and backfill scheduling will 
>> > remain useless to you.
>>
>> Horses for courses, I would say.  We have a default time of 14 days, but
>> because we also have QoS with increased priority, but shorter time
>> limits, there is still an incentive for users to set the time limit
>> themselves.  So currently we have around 900 jobs running, only 100 of
>> which are using the default time limit.  Many of these will be
>> long-running Gaussian jobs and will indeed need the time.
>>
>> > Instead, I recommend you use your job_submit.lua to reject all jobs that 
>> > don't have a wallclock time and print out a helpful error message to 
>> > inform users they
>> > now need to specify a wallclock time, and provide a link to documentation 
>> > on how to do that.
>> >
>> > Requiring users to specify a time limit themselves does two things:
>> >
>> > 1. It reminds them that it's important to be conscious of timelimits when 
>> > submitting jobs
>>
>> This is a good point.  We use 'jobstats', which provides information
>> after a job has completed, about run time relative to time limit,
>> amongst other things, although unfortunately many people don't seem to
>> read this.  However, even if you do force people to set a time limit,
>> they can still choose not to think about it and just set the maximum.
>>
>> > 2. If a job is killed before it's done and all the progress is lost 
>> > because the job wasn't checkpointing, they can't blame you as the admin.
>>
>> I don't really understand this point.  The limit is just the way it is,
>> just as we have caps on the total number of cores or GPUs the jobs given
>> user can use at any one time.  Up to now no-one has blamed us for this.
>>
>> > If you do this, it's easy to get the users on board by first providing 
>> > useful and usable documentation on why timelimits are needed and how to 
>> > set them. Be
>> > sure to hammer home the point that effective timelimits can lead to their 
>> > jobs running sooner, and that effective timelimits can increase cluster
>> > efficiency/utilization, helping them get a better return on their 
>> > investment (if they contribute to the clusters cost) or they'll get more 
>> > science done. I like to
>> > frame it that accurate wallclock times will give them a competitive edge 
>> > in getting their jobs running before other cluster users. Everyone likes 
>> > to think what
>> > they're doing will give them an advantage!
>>
>> I agree with all this and this is also what we also try to do.  The only
>> thing I don't concur with is your last sentence.  In my experience, as
>> long as things work, users will in general not give a fig about whether
>> they are using resources efficiently.  Only when people notice a delay
>> in jobs starting do they become more aware about it and are prepared to
>> take action.  It is particularly a problem with new users, because
>> fairshare means that their jobs will start pretty quickly, no matter how
>> inefficiently they have configured them.  Maybe we should just give new
>> users fewer share initially and only later bump them up to some standard
>> value.
>>
>> Cheers,
>>
>> Loris
>>
>> > My 4 cents (adjusted for inflation).
>> >
>> > Prentice
>> >
>> > On 6/12/25 9:11 PM, Davide DelVento via slurm-users wrote:
>> >
>> >  Sounds good, thanks for confirming it.
>> >  Let me sleep on it wrt the "too many" QOS, or think if I should ditch 
>> > this idea.
>> >  If I'll implement it, I'll post in this conversation details on how I did 
>> > it.
>> >  Cheers
>> >
>> >  On Thu, Jun 12, 2025 at 6:59 AM Ansgar Esztermann-Kirchner 
>> > <aesz...@mpinat.mpg.de> wrote:
>> >
>> >  On Thu, Jun 12, 2025 at 04:52:24AM -0600, Davide DelVento wrote:
>> >  > Hi Ansgar,
>> >  >
>> >  > This is indeed what I was looking for: I was not aware of 
>> > PreemptExemptTime.
>> >  >
>> >  > From my cursory glance at the documentation, it seems
>> >  > that PreemptExemptTime is QOS-based and not job based though. Is that
>> >  > correct? Or could it be set per-job, perhaps on a prolog/submit lua 
>> > script?
>> >
>> >  Yes, that's correct.
>> >  I guess you could create a bunch of QOS with different
>> >  PremptExemptTimes and then let the user select one (or indeed select
>> >  it from lua) but as far as I know, there is no way to set arbitrary
>> >  per-job values.
>> >
>> >  Best,
>> >
>> >  A.
>> >  --
>> >  Ansgar Esztermann
>> >  Sysadmin Dep. Theoretical and Computational Biophysics
>> >  https://www.mpinat.mpg.de/person/11315/3883774
>> >
>> >
>> > --
>> > Prentice Bisbal
>> > HPC Systems Engineer III
>> > Computational & Information Systems Laboratory (CISL)
>> > NSF National Center for Atmospheric Research (NSF NCAR)
>> > https://www.cisl.ucar.edu
>> > https://ncar.ucar.edu
>> --
>> Dr. Loris Bennett (Herr/Mr)
>> FUB-IT, Freie Universität Berlin
>>
>>
>> --
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Implementing a "soft" wall clock limit

Reply via email to