there is one reason to have a hard time limit. garbage collection. if your institution is like mine, visiting academics often come and go and many full-timers are "forgetful" of what they startup. at some point someone has to clean all that up
On Tue, Jun 17, 2025 at 8:18 AM Davide DelVento via slurm-users <slurm-users@lists.schedmd.com> wrote: > > This conversation is drifting a bit away from my initial questions and > covering various other related topics. In fact I do agree with almost > everything written in the last few messages. However, that is somewhat > orthogonal to my initial request, which I now understand has the answer "not > possible with slurm configuration, possible with ugly hacks which are > probably error prone and not worth the hassle". Just for the sake of the > discussion (since I'm enjoying hearing the various perspectives) I'll restate > my request and why I think slurm does not support this need. > > Most clusters have very high utilization all the time. This is good for ROI > etc but annoying to users. Forcing users to specify a firm wallclock limit > helps slurm make good scheduling decisions, which keep utilization (ROI, etc) > high and minimizes wait time for everybody. > > At the place where I work there is a quite different situation: there are > moments of high pressure and long wait, and there are moments in which its > utilization drops under 50% and sometimes even under 25% (e.g. during long > weekends). We can have a discussion about it, but the bottom line is that > management (ROI, etc) is fine with it, so that's the way it is. This > circumstance, I agree, is quite peculiar and not shared by any other place I > worked before or where I ever had an account and saw how things were, but > that is what it is. In this circumstance it feels at least silly and perhaps > even extremely wasteful and annoying to let slurm cancel jobs at their > wallclock limit without considering other context. I mean, imagine a user > with a weeklong job who estimated a 7 day wallclock limit and "for good > measure" requested 8 days, but then the job would actually take 9 days. > Imagine that the 8th day happened in the middle of on a long weekend when > utilization was 25% and there was not a single other job pending. Maybe this > job is a one-off experiment quickly cobbled together to test one thing, so > it's not a well-designed piece of code and does not have checkpoint-restart > capabilities. Why enforce the wallclock limit in that situation? > > The way around this problem in the past was to simply not make the wallclock > limit mandatory (which was decided by my predecessor, who has now left). That > worked, only because the cluster was not in a very good usability status so > most people avoided it anyway and there seldom was a long line of jobs > pending in the queue, so slurm did not need to work very hard to schedule > things. Now that I've improved the usability situation, this has become a > problem, because utilization has become much higher. Perhaps in a short time > people will learn to plan ahead and submit more jobs and fill the machine up > during the weekends too (I'm working on user education towards that), and if > that happens, it will make the above dilemma go away. But for now I have it. > > I'm still mulling on how to best proceed. Maybe just force the users to set a > wallclock limit and live with it. > > Here is another idea that just came to me. Does slurm have a "global" switch > to turn on/off cancelling jobs hitting their wallclock limit? If so, I could > have a cron job checking if there are pending jobs in the queue and if not > shut it off, and if so turn it on. Granted, that may be sloppy (e.g. one job > pending for one resources causing the cancelling of jobs using other > resources) but it's something and it'd be easy to implement compared to the > turn on/off pre-emption as discussed in a previous message. > > Great conversation folks, enjoying reading the various perspectives at > different sites! > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 17, 2025 at 12:26 AM Loris Bennett via slurm-users > <slurm-users@lists.schedmd.com> wrote: >> >> Hi Prentice, >> >> Prentice Bisbal via slurm-users >> >> <slurm-users@lists.schedmd.com> writes: >> >> > I think the idea of having a generous default timelimit is the wrong way >> > to go. In fact, I think any defaults for jobs are a bad way to go. The >> > majority of your >> > users will just use that default time limit, and backfill scheduling will >> > remain useless to you. >> >> Horses for courses, I would say. We have a default time of 14 days, but >> because we also have QoS with increased priority, but shorter time >> limits, there is still an incentive for users to set the time limit >> themselves. So currently we have around 900 jobs running, only 100 of >> which are using the default time limit. Many of these will be >> long-running Gaussian jobs and will indeed need the time. >> >> > Instead, I recommend you use your job_submit.lua to reject all jobs that >> > don't have a wallclock time and print out a helpful error message to >> > inform users they >> > now need to specify a wallclock time, and provide a link to documentation >> > on how to do that. >> > >> > Requiring users to specify a time limit themselves does two things: >> > >> > 1. It reminds them that it's important to be conscious of timelimits when >> > submitting jobs >> >> This is a good point. We use 'jobstats', which provides information >> after a job has completed, about run time relative to time limit, >> amongst other things, although unfortunately many people don't seem to >> read this. However, even if you do force people to set a time limit, >> they can still choose not to think about it and just set the maximum. >> >> > 2. If a job is killed before it's done and all the progress is lost >> > because the job wasn't checkpointing, they can't blame you as the admin. >> >> I don't really understand this point. The limit is just the way it is, >> just as we have caps on the total number of cores or GPUs the jobs given >> user can use at any one time. Up to now no-one has blamed us for this. >> >> > If you do this, it's easy to get the users on board by first providing >> > useful and usable documentation on why timelimits are needed and how to >> > set them. Be >> > sure to hammer home the point that effective timelimits can lead to their >> > jobs running sooner, and that effective timelimits can increase cluster >> > efficiency/utilization, helping them get a better return on their >> > investment (if they contribute to the clusters cost) or they'll get more >> > science done. I like to >> > frame it that accurate wallclock times will give them a competitive edge >> > in getting their jobs running before other cluster users. Everyone likes >> > to think what >> > they're doing will give them an advantage! >> >> I agree with all this and this is also what we also try to do. The only >> thing I don't concur with is your last sentence. In my experience, as >> long as things work, users will in general not give a fig about whether >> they are using resources efficiently. Only when people notice a delay >> in jobs starting do they become more aware about it and are prepared to >> take action. It is particularly a problem with new users, because >> fairshare means that their jobs will start pretty quickly, no matter how >> inefficiently they have configured them. Maybe we should just give new >> users fewer share initially and only later bump them up to some standard >> value. >> >> Cheers, >> >> Loris >> >> > My 4 cents (adjusted for inflation). >> > >> > Prentice >> > >> > On 6/12/25 9:11 PM, Davide DelVento via slurm-users wrote: >> > >> > Sounds good, thanks for confirming it. >> > Let me sleep on it wrt the "too many" QOS, or think if I should ditch >> > this idea. >> > If I'll implement it, I'll post in this conversation details on how I did >> > it. >> > Cheers >> > >> > On Thu, Jun 12, 2025 at 6:59 AM Ansgar Esztermann-Kirchner >> > <aesz...@mpinat.mpg.de> wrote: >> > >> > On Thu, Jun 12, 2025 at 04:52:24AM -0600, Davide DelVento wrote: >> > > Hi Ansgar, >> > > >> > > This is indeed what I was looking for: I was not aware of >> > PreemptExemptTime. >> > > >> > > From my cursory glance at the documentation, it seems >> > > that PreemptExemptTime is QOS-based and not job based though. Is that >> > > correct? Or could it be set per-job, perhaps on a prolog/submit lua >> > script? >> > >> > Yes, that's correct. >> > I guess you could create a bunch of QOS with different >> > PremptExemptTimes and then let the user select one (or indeed select >> > it from lua) but as far as I know, there is no way to set arbitrary >> > per-job values. >> > >> > Best, >> > >> > A. >> > -- >> > Ansgar Esztermann >> > Sysadmin Dep. Theoretical and Computational Biophysics >> > https://www.mpinat.mpg.de/person/11315/3883774 >> > >> > >> > -- >> > Prentice Bisbal >> > HPC Systems Engineer III >> > Computational & Information Systems Laboratory (CISL) >> > NSF National Center for Atmospheric Research (NSF NCAR) >> > https://www.cisl.ucar.edu >> > https://ncar.ucar.edu >> -- >> Dr. Loris Bennett (Herr/Mr) >> FUB-IT, Freie Universität Berlin >> >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com