Hi,
I'm very new to Slurm and try to understand basic concepts. One of them
is the "Multifactor Priority Plugin". For this I submitted some jobs and
looked at sshare output. To my surprise I don't get any numbers for
"RawUsage", regardless what I do RawUsage stays 0 (same in "scontrol
show as
I had
to do it and had no hints)
Sorry for bothering you
Matthias
Am 06.04.21 um 17:06 schrieb Matthias Leopold:
Hi,
I'm very new to Slurm and try to understand basic concepts. One of them
is the "Multifactor Priority Plugin". For this I submitted some jobs and
looked at ss
Hi,
can someone please explain to me why it's possible to set Grp* resource
limits on user associations? What's the use for this? As far as I
understood documentation accounts can have children, but not users.
I'm still a newbie exploring Slurm in a test environment, please excuse
maybe stup
Hi,
I'm testing how limiting memory resources works in Slurm.
I'm using TaskPlugin=affinity,cgroup (slurm.conf) and
ConstrainRAMSpace=yes (cgroup.conf) and have set a MaxMemPerCPU limit on
the partition.
To my surprise MaxMemPerCPU is enforced as long as the job submission
requests a memory li
at is expected behavior,
but it would keep you from having to do something with a plugin.
Jeff
*From:* slurm-users on behalf of
Matthias Leopold
*Sent:* Thursday, April 22, 2021 5:13 AM
*To:* Slurm User Community List
*Su
Hi,
I'm trying to prepare for using Slurm with DGX A100 systems with MIG
configuration. I will have several gres:gpu types there so I tried to
reproduce the situation described in "Specific limits over GRES" from
https://slurm.schedmd.com/resource_limits.html, but I can't.
In my test environ
Hi,
I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework
(https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How
can I tell if UCX is actually included in the resulting binaries
(without actually using Slurm)? I was looking at executables and *so
files with
Am 12.01.22 um 17:54 schrieb Matthias Leopold:
Hi,
I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework
(https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How
can I tell if UCX is actually included in the resulting binaries
(without actu
Hi,
we have 2 DGX A100 systems which we would like to use with Slurm. We
want to use the MIG feature for _some_ of the GPUs. As I somehow
suspected I couldn't find a working setup for this in Slurm yet. I'll
describe the configuration variants I tried after creating the MIG
instances, it migh
devices. But there are downsides like no multi node MPI
jobs and in general I still can't believe there is such a limitation.
thx again for any feedback
Matthias
Am 27.01.22 um 16:27 schrieb Matthias Leopold:
Hi,
we have 2 DGX A100 systems which we would like to use with Slurm. We
want t
ives me
everything I want, sorry for bothering you.
Matthias
Am 27.01.22 um 16:27 schrieb Matthias Leopold:
Hi,
we have 2 DGX A100 systems which we would like to use with Slurm. We
want to use the MIG feature for _some_ of the GPUs. As I somehow
suspected I couldn't find a working setup
Hi,
I know this might be a too simple question for a bigger topic, but I'll
just try: is there something like seff for measuring the efficiency of
NVIDIA GPU usage in Slurm jobs?
thx
Matthias
Hi,
I want to access the kernel "user" keyrings inside a Slurm job on a
Ubuntu 20.04 node. I'm not an expert on keyrings (yet), I just
discovered that inside a Slurm job a keyring for "user: invocation_id"
is used, which seems to be shared across all users of the executing
Slurm node (other u
|The Rachel and Selim Benin School
[] /\ |of Computer Science and Engineering
[]//\\/ |The Hebrew University of Jerusalem
[// \\ |T +972-2-5494522 | F +972-2-5494522
// \ |ir...@cs.huji.ac.il <mailto:ir...@cs.huji.ac.il>
// |
--
Matthias Leopold
Hi,
I'm trying to use AllowGroups for partition configuration in my Slurm
21.08 cluster. Unexpectedly this doesn't seem to work. My user can't
submit jobs although he is member of group mentioned in AllowGroups:
srun: error: Unable to allocate resources: User's group not permitted to
use thi
27;t enough for certain configuration changes.
Regards,
Marko
On Tue, Jul 4, 2023 at 3:57 AM Matthias Leopold
<mailto:matthias.leop...@meduniwien.ac.at>> wrote:
Hi,
I'm trying to use AllowGroups for partition configuration in my Slurm
21.08 cluster. Unexpectedly this
On 05/07/2023 17:17, Matthias Leopold wrote:
Thanks, but unfortunately that didn't help.
Regards,
Matthias
Am 05.07.23 um 17:59 schrieb Marko Markoc:
Hi Matthias,
Before you start digging deeper into this, I would recommend
restarting the `slurmctld` service. I've had simila
Hi,
not sure if this is the right place:
Our Slurm 21.08 is compiled against NVML from CUDA 11.4 for
"AutoDetect=nvml" support in gres.conf. Currently we use A100 GPU, I
would like to know if we could use H100 GPU with this setup or if I need
newer NVML (what version?). I didn't find anything
Hi,
I want to change Gres definition for a Node
from
NodeName=s0-n10 Gres=gpu:a100:5
to
NodeName=s0-n10 Gres=gpu:a100-sxm4-80gb:5
-> HW stays the same, only Gres name changes, a100-sxm4-80gb is already
defined in Cluster
When I do this online will this affect running jobs on the Node?
Slur
Hi,
I need to take care of a 17.02 Slurm cluster (I'm preparing it for
upgrades). I see that slurmdbd logs various "cluster not registered"
messages at startup (DBD_CLUSTER_TRES,DBD_JOB_START,DBD_STEP_START), but
I don't see a real problem. Accounting works. Do I have to worry? Can
this be re
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when
I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no
reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would
expect).
@altoslabs.com>
On Wed, Nov 13, 2024 at 10:21 AM Matthias Leopold via slurm-users
mailto:slurm-users@lists.schedmd.com>>
wrote:
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so,
Hi,
I compiled and installed Slurm 24.05 on Ubuntu 22.04 following this
tutorial: https://www.schedmd.com/slurm/installation-tutorial/
Systemd service files are from deb packages that result from this.
Do I have to worry that slurmctld and slurmd don't write PID files
although SlurmctldPidFil
Hi,
I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA
deepops framework a couple of years ago. It is based on Ubuntu 20.04 and
makes use of the NVIDIA pyxis/enroot container solution. For operational
validation I used the nccl-tests application in a container. nccl-tests
: *Davide DelVento
*Date: *Thursday, March 27, 2025 at 7:41 AM
*To: *Matthias Leopold
*Cc: *Slurm User Community List
*Subject: *[EXTERNAL] [slurm-users] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI
Hi Matthias,
I see. It does not freak me out. Unfortunately I have very little
experience working wit
ver-else-you-need" (which obviously may or may not be relevant for
your case).
Cheers,
Davide
On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users
mailto:slurm-users@lists.schedmd.com>>
wrote:
Hi,
I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and N
seeing the message you have in you
original post?
Howard
On 3/27/25, 9:20 AM, "Matthias Leopold" mailto:matthias.leop...@meduniwien.ac.at>> wrote:
Hi Howard,
thanks, but my Slurm 24.05 definitely has pmix support (visible in "srun
–mpi=list") and it uses it through "
Thanks for all replies. I'll take the hints with running
slurmctld/slurmdbd on separate nodes and disabling systemd units when
upgrading (I thought of that) with me.
Matthias
Am 06.03.25 um 17:04 schrieb Matthias Leopold via slurm-users:
Hi,
I'm building Slurm Debian packages fr
Hi,
I'm building Slurm Debian packages from SchedMD sources using this
tutorial https://www.schedmd.com/slurm/installation-tutorial/.
Now I tried upgrading (minor release upgrade within 24.05) using these
packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade
(a) slurmdbd (b) sl
29 matches
Mail list logo