[slurm-users] Re: Feature request: Max Jobs Per Minute

2024-09-16 Thread John Hanks via slurm-users
I left out a critical let task+=1 in that pseudo-loop 😁 griznog On Sun, Sep 15, 2024 at 6:44 PM John Hanks wrote: > No ideas on fixing this in Slurm, but in userspace in the past when faced > with huge array jobs which had really short jobs like this I've nudged them > towa

[slurm-users] Re: Feature request: Max Jobs Per Minute

2024-09-15 Thread John Hanks via slurm-users
No ideas on fixing this in Slurm, but in userspace in the past when faced with huge array jobs which had really short jobs like this I've nudged them toward batching up array elements in each job to extend it. Say the user wants to run 5 tasks, 30 seconds each. Batching those up in groups of 10

Re: [slurm-users] Allow SFTP on a specific compute node

2022-07-12 Thread John Hanks
Hi Fritz, Purely theoretical and untested solution, but it may work to "cp /usr/bin/sshd /usr/bin/sshd2" and then use that sshd2 binary to run an sshd service on a different port, with a config limiting it to sftp only and a `/etc/pam.d/sshd2` file that does not enforce pam_slurm_adopt. Downside i

Re: [slurm-users] what is the elegant way to drain node from epilog with self-defined reason?

2022-05-03 Thread John Hanks
I've done similar by having the epilog touch a file, then have the node health check (LBNL NHC) act on that file's presence/contents later to do the heavy lifting. There's a window of time/delay where the reason is "Epilog error" before the health check corrects it, but if that's tolerable this mak

Re: [slurm-users] [EXTERNAL] Re: Managing shared memory (/dev/shm) usage per job?

2022-04-06 Thread John Hanks
Thanks, Greg! This looks like the right way to do this. I will have to stop putting off learning to use spank plugins :) griznog On Wed, Apr 6, 2022 at 1:40 AM Greg Wickham wrote: > Hi John, Mark, > > > > We use a spank plugin > https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this w

Re: [slurm-users] Managing shared memory (/dev/shm) usage per job?

2022-04-05 Thread John Hanks
I've thought-experimented this in the past, wanting to do the same thing but haven't found any way to get a/dev/shm or a tmpfs into a job's cgroups to be accounted against the job's allocation. The best I have come up with is creating a per-job tmpfs from a prolog, removing from epilog and setting

Re: [slurm-users] how to locate the problem when slurm failed to restrict gpu usage of user jobs

2022-03-23 Thread John Hanks
Do you have a matching Gres=gpu:4 or similar in your node config lines? I'm not sure if that is still required, but we have it in our config which does work to isolate GPUs to jobs they are assigned to. griznog On Wed, Mar 23, 2022 at 9:45 AM wrote: > Hi, all: > > > > We found a problem that sl

Re: [slurm-users] slurm-17.11.5 usage of X11

2018-04-13 Thread John Hanks
Hi Philippe, The built-in --x11 support still has some rough edges to it. Due to a timeout issue with getting xauth set up, I've been patching SLURM with these minor changes: --- src/common/run_command.c_orig 2018-02-06 15:03:10.0 -0800 +++ src/common/run_command.c 2018-02-06 15:26:23.525

Re: [slurm-users] Over-riding array limits

2018-03-01 Thread John Hanks
HI, Short answer: scontrol update jobid=JOBID arraytaskthrottle=NEWLIMIT Long answer: https://bugs.schedmd.com/show_bug.cgi?id=1863 jbh On Sat, Feb 24, 2018 at 5:55 AM, Bill Barth wrote: > We don’t allow array jobs (we have our own tools for packing small jobs > into bigger ones), so I can

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread John Hanks
I've used this with some success: https://github.com/JohannesBuchner/verynice. For CPU intensive things it works great, but you have to also set some memory limits in limits.conf if users do any large memory stuff. Otherwise I just use a problem process as a chance to start a conversation with that