On 20-05-2020 00:03, Flynn, David P. (Dave) wrote:
Where does Slurm keep track of the latest jobid. Since it is persistent
across reboots, I suspect it’s in a file somewhere.
$ scontrol show config | grep MaxJobId
Hello all,
I just upgraded my cluster to Slurm 20.02.2 from 18.08.7. Previously
we were using a spank plugin [1] to make tmp space unique for each job
and auto-cleaning.
Unfortunately, it looks like the slurm_spank_job_prolog and
slurm_spank_job_epilog functions are no longer getting called at al
Where does Slurm keep track of the latest jobid. Since it is persistent across
reboots, I suspect it’s in a file somewhere.
—
Dave Flynn
Hi Lisa,
Im actually referring to the ability to create a reservation that includes a
gpu resource. It doesn't seem to be possible, which seems strange. This would
be very helpful for us to have a floating gpu reservation.
Best,
Chris
--
Christopher Coffey
High-Performance Computing
Northern
I am a newbie at the Slurm setup but if by reservable you also mean a
consumable resource I am able to request gpus and I have Slurm 20.02.1 and cuda
10.2. I just set this up within the last month.
***
Lisa Weihl Systems Administrato
I get it working with a slurm Prolog, but because Prolog runs on every job, it
creates a TMPDIR every time it slurm runs a job on any node.
Ideally, I’d like to create the TMPDIR ONLY if the job requested –tmp.
I’ve only perused the code a bit, to see how it works, but does this Spank
plugin on
Hi All,
Can anyone confirm that GPU is still not a reservable resource? It doesn't seem
to be possible still in 19.05.6. I haven't tried 20.02 series.
Best,
Chris
--
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 11/11/18, 1:19 AM, "slurm-users
Hi
One of my users reported a job cancelled before it completed. She got this:
"
slurmstepd: *** JOB 390031 ON bigger4 CANCELLED AT 2020-05-18T22:27:04 ***
"
The job was apparently cancelled by root:
"
sacct -j 390031 --format="jobid,state%30"
JobID State
--
Hello,
I have a problem with GrpTres, I specify the limits with
sacctmgr --immediate modify user where user= set GrpTres=cpu=144,node=4
but when the user send serial jobs, for example 5 jobs , the user only can
execute 4, and the rest of the jobs are PD with the reason=AssocGrpNodeLimit.
I
I was actually looking at something else (tm) when I noticed that
two of our Slurm controlled resources had different config values
for KillOnBadExit, and so I went looking for clues.
I read this:
KillOnBadExit
If set to 1, a step will be terminated immediately if any task is
crashed
Hi,
In lua you can check for the job_desc.script field to be empty:
if (job_desc.script == nil or job_desc.script == '') then
...
Regards,
Carlos
On Mon, May 18, 2020 at 4:07 PM Stephan Roth
wrote:
> Dear all,
>
> Does anybody know of a way to detect whether a job is submitted with
> srun, pr
Hi,
We have here a job_submit_limit_interactive plugin that limits interactive
jobs and can force a partition for such jobs. It also limits the number of
concurrent interactive jobs per user by using the license system. It's
written in c, so compilation is required. It can be found in:
https://git
Hi Erik,
We use a private fork of https://github.com/hpc2n/spank-private-tmp
It has worked quite well for us - jobs (or steps) don’t share a /tmp and during
the prolog all files created for the job/step are deleted.
Users absolutely cannot see each others temporary files so there’s no issue
ev
13 matches
Mail list logo