We have a node with 8 H100 GPUs that are split into MIG instances. We are using
cgroups. This seems to work fine. Users can do something like
sbatch --gres="gpu:1g.10gb:1"...
and the job starts on the node with the gpus and cuda visible devices and the
pytorch debug shows that the cgroup only g
?
Regards,
Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation
From: Emyr James via slurm-users
Sent: 12 July 2024 11:51
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Job Step State
Dear all,
I am working on a script to take
Not sure if this is correct but I think you need to leave a bit of RAM for the
OS to use so best not to allow slurm to allocate ALL of it. I usually take 8G
off to allow for that - negligible when our nodes have at least 768GB of RAM.
At least this is my experience when using cgroups.
Emyr Jame
Dear all,
I am working on a script to take completed job accounting data from the slurm
accounting database and insert the equivalent data into a clickhouse table for
fast reporting
I can see that all the information is included in the cluster_job_table and
cluster_job_step_table which seem to
?
Presumably non-cgroup accounting has a similar issue ? I.e. it polls rss and
then the accounting db reports the highest seen even though using getrusage and
checking ru_maxrss should be done too ?
Many thanks,
Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation
__
for Genomic Regulation
From: Emyr James via slurm-users
Sent: 20 May 2024 13:56
To: Thomas Green - Staff in University IT, Research Technologies / Staff
Technoleg Gwybodaeth, Technolegau Ymchwil ; Davide
DelVento
Cc: slurm-users@lists.schedmd.com
Subject: [slur
.g.
https://github.com/google/cadvisor/issues/3286<https://urldefense.com/v3/__https://github.com/google/cadvisor/issues/3286__;!!D9dNQwwGXtA!UW9JUyJ5ByL6XxihSUX-hn_HC2rYL-BZ8HtbdSlP10hGha71tuIHFmUOQ7dPpEseh3Ecyo-rrPUDVWPKJ280u9w$>
Tom
From: Emyr James via slurm-users
Date: Monday, 2
ation
available on this functionality ?
Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation
From: Emyr James via slurm-users
Sent: 17 May 2024 01:15
To: Davide DelVento
Cc: slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: memory high
(which also uses getrusage) or a variant you will be able to do that.
On Thu, May 16, 2024 at 4:10 PM Emyr James via slurm-users
mailto:slurm-users@lists.schedmd.com>> wrote:
Hi,
We are trying out slurm having been running grid engine for a long while.
In grid engine, the cgroups peak memor
Hi,
We are trying out slurm having been running grid engine for a long while.
In grid engine, the cgroups peak memory and max_rss are generated at the end of
a job and recorded. It logs the information from the cgroup hierarchy as well
as doing a getrusage call right at the end on the parent pid
10 matches
Mail list logo