“sinfo” can expand compressed hostnames too:
$ sinfo -n lm602-[08,10] -O NodeHost -h
lm602-08
lm602-10
$
-Greg
From: slurm-users on behalf of Alain O'
Miniussi
Date: Thursday, 17 August 2023 at 4:53 pm
To: Slurm User Community List
Subject: [EXTERNAL] Re: [slurm-users] extended list of n
Yup – Slurm is specifically tied to MySQL/MariaDB.
To get around this I wrote an C++ application that will extract job records
from Slurm using “sacct” and write them into a PostgreSQL database.
https://gitlab.com/greg.wickham/sminer
The schema used in PostgreSQL is more conduci
Following on from what Michael said, the default Slurm configuration is to
allocate only one job per node. If GRES a100_1g.10gb is on the same node ensure
to enable “SelectType=select/cons_res” (info at
https://slurm.schedmd.com/cons_res.html) to permit multiple jobs to use the
same node.
Also
entation on those. Are you just creating those files and then including
them in slurm.conf?
Rob
From: slurm-users on behalf of Greg
Wickham
Sent: Wednesday, January 18, 2023 1:38 AM
To: Slurm User Community List
Subject: Re: [slurm-users] Maintaining slur
Hi Rob,
Slurm doesn’t have a “validate” parameter hence one must know ahead of time
whether the configuration will work or not.
In answer to your question – yes – on our site the Slurm configuration is
altered outside of a maintenance window.
Depending upon the potential impact of the change,
rmdbd? Not sure.
I have intentionally used the slurmdbd + mariadb in the second node because I
didn't want to overload the primary slurmctld.
I hope you all are getting the picture of how my set up is.
Thanks,
RC
On 11/1/2022 10:40 AM, Greg Wickham wrote:
Hi Richard,
Slurmctld caches th
Hi Richard,
Slurmctld caches the updates until slurmdbd comes back online.
You can see how many records are pending for the database by using the “sdiag”
command and looking for “DBD Agent queue size”.
If this number grows significantly it means that slurmdbd isn’t available.
-Greg
On 01/1
Hi Richard,
We have just over 400 nodes and the StateSaveLocation directory has ~600MB of
data.
The share for SlurmdSpoolDir is about 17GB used across the nodes, but this also
includes logs for each node (without log files it’s < 1GB).
-Greg
On 24/10/2022, 07:19, "slurm-users"
wrote:
Hi
Hi Purvesh,
With some caveats, you can do:
$ sacct -N -X -S -E -P format=jobid,alloctres
And then post process the results with a scripting language.
The caveats? . . The -X above is returning the job allocation, which in your
case it appears to be everything you need. However for a job or
Hi John, Mark,
We use a spank plugin
https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this was derived
from other authors but modified for functionality required on site).
It can bind tmpfs mount points to the users cgroup allocation, additionally
bind options can be provided (ie: l
If it’s possible to see other GPUs within a job then that means that cgroups
aren’t being used.
Look at the cgroup documentation of slurm
(https://slurm.schedmd.com/cgroup.conf.html)
With cgroups activated an `nvidia-smi` will only show the GPU allocated to the
job.
-greg
From: slurm-user
Hi Chris,
You mentioned “But trials using this do not seem to be fruitful so far.” . .
why?
In our job_submit.lua there is:
if job_desc.shared == 0 then
slurm.user_msg("exclusive access is not permitted with GPU jobs.")
slurm.user_msg("Remove '--exclusive' from your job submissi
As others have commented, some information is lost when it is stored in the
database.
To keep historically accurate data on the job run a script (refer to
PrologSlurmctld in slurm.conf) that runs an "scontrol show -d job " and
drops it into a local file.
Using " PrologSlurmctld" is neat, as it
Hi Diego,
Disclaimer: A little bit of shameless self-promotion.
We're using an application I wrote to inject slurm accounting records into a
PostreSQL database. The
data is extracted from Slurm using "sacct".
From there it's possible to use SQL queries to mine the raw slurm data.
https://
Hi Erik,
We use a private fork of https://github.com/hpc2n/spank-private-tmp
It has worked quite well for us - jobs (or steps) don’t share a /tmp and during
the prolog all files created for the job/step are deleted.
Users absolutely cannot see each others temporary files so there’s no issue
ev
Something to try . .
If you restart “slurmctld” does the new QOS apply?
We had a situation where slurmdbd was running as a different user than
slurmctld and hence sacctmgr changes weren’t being reflected in slurmctld.
-greg
On 27 Apr 2020, at 12:57, Simon Andrews
mailto:simon.andr...@babr
-GPU
nodes and
a plethora of 1 GPU jobs - during heavy use the user may not have access to the
GPU they require).
Has anyone any experience with changing GPU permissions during prolog /
epilogue?
thanks,
-greg
--
Dr. Greg Wickham
Advanced Computing Infrastructure Team Lead
Advanced Computing
hael Di Domenico
mailto:mdidomeni...@gmail.com>> wrote:
i've seen the same error, i don't think it's you. but i don't know
what the cause is either, i didn't have time to look into it so i
backed up to pmix 2.2.1 which seems to work fine
On Tue, Jan 22, 2019 at 1
Hi All,
I’m trying to build pmix 3.1.1 against slurm 18.08.4, however in the slurm
pmix plugin I get a fatal error:
pmixp_client.c:147:28: error: ‘flag’ undeclared (first use in this
function)
PMIX_VAL_SET(&kvp->value, flag, 0);
Is there something wrong with my build environme
de in the prod partition to drain without affecting the
> node status in the maint partition. I don't believe I can do this
> though. I believe i have to change the slurm.conf and reconfigure to
> add/remove nodes from one partition or the other
>
> if anyone has a better solut
there a recommended “kernel overhead” memory (either % or absolute value)
that we should deduct from the total physical memory?
thanks,
-greg
--
Dr. Greg Wickham
Advanced Computing Infrastructure Team Lead
Advanced Computing Core Laboratory
King Abdullah University of Science and Technology
21 matches
Mail list logo