Re: [slurm-users] How does slurm keep track of latest jobid

2020-05-19 Thread Ole Holm Nielsen
On 20-05-2020 00:03, Flynn, David P. (Dave) wrote: Where does Slurm keep track of the latest jobid.  Since it is persistent across reboots, I suspect it’s in a file somewhere. $ scontrol show config | grep MaxJobId

[slurm-users] Spank Prolog/Epilog functions within 20.02

2020-05-19 Thread Adam Tygart
Hello all, I just upgraded my cluster to Slurm 20.02.2 from 18.08.7. Previously we were using a spank plugin [1] to make tmp space unique for each job and auto-cleaning. Unfortunately, it looks like the slurm_spank_job_prolog and slurm_spank_job_epilog functions are no longer getting called at al

[slurm-users] How does slurm keep track of latest jobid

2020-05-19 Thread Flynn, David P. (Dave)
Where does Slurm keep track of the latest jobid. Since it is persistent across reboots, I suspect it’s in a file somewhere. — Dave Flynn

Re: [slurm-users] Reserving a GPU (Christopher Benjamin Coffey)

2020-05-19 Thread Christopher Benjamin Coffey
Hi Lisa, Im actually referring to the ability to create a reservation that includes a gpu resource. It doesn't seem to be possible, which seems strange. This would be very helpful for us to have a floating gpu reservation. Best, Chris -- Christopher Coffey High-Performance Computing Northern

Re: [slurm-users] Reserving a GPU (Christopher Benjamin Coffey)

2020-05-19 Thread Lisa Kay Weihl
I am a newbie at the Slurm setup but if by reservable you also mean a consumable resource I am able to request gpus and I have Slurm 20.02.1 and cuda 10.2. I just set this up within the last month. *** Lisa Weihl Systems Administrato

Re: [slurm-users] Reset TMPDIR for All Jobs

2020-05-19 Thread Ellestad, Erik
I get it working with a slurm Prolog, but because Prolog runs on every job, it creates a TMPDIR every time it slurm runs a job on any node. Ideally, I’d like to create the TMPDIR ONLY if the job requested –tmp. I’ve only perused the code a bit, to see how it works, but does this Spank plugin on

Re: [slurm-users] Reserving a GPU

2020-05-19 Thread Christopher Benjamin Coffey
Hi All, Can anyone confirm that GPU is still not a reservable resource? It doesn't seem to be possible still in 19.05.6. I haven't tried 20.02 series. Best, Chris -- Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 11/11/18, 1:19 AM, "slurm-users

[slurm-users] Job cancelled by root - why?

2020-05-19 Thread Torkil Svensgaard
Hi One of my users reported a job cancelled before it completed. She got this: " slurmstepd: *** JOB 390031 ON bigger4 CANCELLED AT 2020-05-18T22:27:04 *** " The job was apparently cancelled by root: " sacct -j 390031 --format="jobid,state%30" JobID State --

[slurm-users] problems with number of jobs with GrpTres

2020-05-19 Thread Alberto Morillas, Angelines
Hello, I have a problem with GrpTres, I specify the limits with sacctmgr --immediate modify user where user= set GrpTres=cpu=144,node=4 but when the user send serial jobs, for example 5 jobs , the user only can execute 4, and the rest of the jobs are PD with the reason=AssocGrpNodeLimit. I

[slurm-users] KillOnBadExit or srun's -K: step, job, task, process all get a mention in dispatches

2020-05-19 Thread Kevin Buckley
I was actually looking at something else (tm) when I noticed that two of our Slurm controlled resources had different config values for KillOnBadExit, and so I went looking for clues. I read this: KillOnBadExit If set to 1, a step will be terminated immediately if any task is crashed

Re: [slurm-users] How to detect Job submission by srun / interactive jobs

2020-05-19 Thread Carlos Fenoy
Hi, In lua you can check for the job_desc.script field to be empty: if (job_desc.script == nil or job_desc.script == '') then ... Regards, Carlos On Mon, May 18, 2020 at 4:07 PM Stephan Roth wrote: > Dear all, > > Does anybody know of a way to detect whether a job is submitted with > srun, pr

Re: [slurm-users] [External] How to detect Job submission by srun / interactive jobs

2020-05-19 Thread Yair Yarom
Hi, We have here a job_submit_limit_interactive plugin that limits interactive jobs and can force a partition for such jobs. It also limits the number of concurrent interactive jobs per user by using the license system. It's written in c, so compilation is required. It can be found in: https://git

Re: [slurm-users] Reset TMPDIR for All Jobs

2020-05-19 Thread Greg Wickham
Hi Erik, We use a private fork of https://github.com/hpc2n/spank-private-tmp It has worked quite well for us - jobs (or steps) don’t share a /tmp and during the prolog all files created for the job/step are deleted. Users absolutely cannot see each others temporary files so there’s no issue ev