[slurm-users] RawUsage 0??

2021-04-06 Thread Matthias Leopold
Hi, I'm very new to Slurm and try to understand basic concepts. One of them is the "Multifactor Priority Plugin". For this I submitted some jobs and looked at sshare output. To my surprise I don't get any numbers for "RawUsage", regardless what I do RawUsage stays 0 (same in "scontrol show as

Re: [slurm-users] RawUsage 0??

2021-04-07 Thread Matthias Leopold
I had to do it and had no hints) Sorry for bothering you Matthias Am 06.04.21 um 17:06 schrieb Matthias Leopold: Hi, I'm very new to Slurm and try to understand basic concepts. One of them is the "Multifactor Priority Plugin". For this I submitted some jobs and looked at ss

[slurm-users] Grp* Resource Limits on User Associations

2021-04-16 Thread Matthias Leopold
Hi, can someone please explain to me why it's possible to set Grp* resource limits on user associations? What's the use for this? As far as I understood documentation accounts can have children, but not users. I'm still a newbie exploring Slurm in a test environment, please excuse maybe stup

[slurm-users] limiting memory usage when submission doesn't specify memory requirements?

2021-04-22 Thread Matthias Leopold
Hi, I'm testing how limiting memory resources works in Slurm. I'm using TaskPlugin=affinity,cgroup (slurm.conf) and ConstrainRAMSpace=yes (cgroup.conf) and have set a MaxMemPerCPU limit on the partition. To my surprise MaxMemPerCPU is enforced as long as the job submission requests a memory li

Re: [slurm-users] limiting memory usage when submission doesn't specify memory requirements?

2021-04-23 Thread Matthias Leopold
at is expected behavior, but it would keep you from having to do something with a plugin. Jeff *From:* slurm-users on behalf of Matthias Leopold *Sent:* Thursday, April 22, 2021 5:13 AM *To:* Slurm User Community List *Su

[slurm-users] Specific limits over GRES - still relevant?

2021-07-01 Thread Matthias Leopold
Hi, I'm trying to prepare for using Slurm with DGX A100 systems with MIG configuration. I will have several gres:gpu types there so I tried to reproduce the situation described in "Specific limits over GRES" from https://slurm.schedmd.com/resource_limits.html, but I can't. In my test environ

[slurm-users] Building Slurm with UCX support

2022-01-12 Thread Matthias Leopold
Hi, I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework (https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How can I tell if UCX is actually included in the resulting binaries (without actually using Slurm)? I was looking at executables and *so files with

Re: [slurm-users] Building Slurm with UCX support

2022-01-12 Thread Matthias Leopold
Am 12.01.22 um 17:54 schrieb Matthias Leopold: Hi, I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework (https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How can I tell if UCX is actually included in the resulting binaries (without actu

[slurm-users] addressing NVIDIA MIG + non MIG devices in Slurm

2022-01-27 Thread Matthias Leopold
Hi, we have 2 DGX A100 systems which we would like to use with Slurm. We want to use the MIG feature for _some_ of the GPUs. As I somehow suspected I couldn't find a working setup for this in Slurm yet. I'll describe the configuration variants I tried after creating the MIG instances, it migh

Re: [slurm-users] addressing NVIDIA MIG + non MIG devices in Slurm - within one node

2022-01-27 Thread Matthias Leopold
devices. But there are downsides like no multi node MPI jobs and in general I still can't believe there is such a limitation. thx again for any feedback Matthias Am 27.01.22 um 16:27 schrieb Matthias Leopold: Hi, we have 2 DGX A100 systems which we would like to use with Slurm. We want t

Re: [slurm-users] addressing NVIDIA MIG + non MIG devices in Slurm - solved

2022-01-31 Thread Matthias Leopold
ives me everything I want, sorry for bothering you. Matthias Am 27.01.22 um 16:27 schrieb Matthias Leopold: Hi, we have 2 DGX A100 systems which we would like to use with Slurm. We want to use the MIG feature for _some_ of the GPUs. As I somehow suspected I couldn't find a working setup

[slurm-users] seff for NVIDIA GPU usage?

2022-06-07 Thread Matthias Leopold
Hi, I know this might be a too simple question for a bigger topic, but I'll just try: is there something like seff for measuring the efficiency of NVIDIA GPU usage in Slurm jobs? thx Matthias

[slurm-users] Kernel keyrings on Slurm node inside Slurm job

2022-08-23 Thread Matthias Leopold
Hi, I want to access the kernel "user" keyrings inside a Slurm job on a Ubuntu 20.04 node. I'm not an expert on keyrings (yet), I just discovered that inside a Slurm job a keyring for "user: invocation_id" is used, which seems to be shared across all users of the executing Slurm node (other u

Re: [slurm-users] Kernel keyrings on Slurm node inside Slurm job

2022-08-25 Thread Matthias Leopold
|The Rachel and Selim Benin School [] /\ |of Computer Science and Engineering []//\\/ |The Hebrew University of Jerusalem [// \\ |T +972-2-5494522 | F +972-2-5494522 // \ |ir...@cs.huji.ac.il <mailto:ir...@cs.huji.ac.il> // | -- Matthias Leopold

[slurm-users] AllowGroups for Partition not working?

2023-07-04 Thread Matthias Leopold
Hi, I'm trying to use AllowGroups for partition configuration in my Slurm 21.08 cluster. Unexpectedly this doesn't seem to work. My user can't submit jobs although he is member of group mentioned in AllowGroups: srun: error: Unable to allocate resources: User's group not permitted to use thi

Re: [slurm-users] AllowGroups for Partition not working?

2023-07-05 Thread Matthias Leopold
27;t enough for certain configuration changes. Regards, Marko On Tue, Jul 4, 2023 at 3:57 AM Matthias Leopold <mailto:matthias.leop...@meduniwien.ac.at>> wrote: Hi, I'm trying to use AllowGroups for partition configuration in my Slurm 21.08 cluster. Unexpectedly this

Re: [slurm-users] AllowGroups for Partition not working?

2023-07-06 Thread Matthias Leopold
On 05/07/2023 17:17, Matthias Leopold wrote: Thanks, but unfortunately that didn't help. Regards, Matthias Am 05.07.23 um 17:59 schrieb Marko Markoc: Hi Matthias, Before you start digging deeper into this, I would recommend restarting the `slurmctld` service. I've had simila

[slurm-users] Slurm + NVIDIA H100 + NVML Version

2023-08-21 Thread Matthias Leopold
Hi, not sure if this is the right place: Our Slurm 21.08 is compiled against NVML from CUDA 11.4 for "AutoDetect=nvml" support in gres.conf. Currently we use A100 GPU, I would like to know if we could use H100 GPU with this setup or if I need newer NVML (what version?). I didn't find anything

[slurm-users] Reconfigure Gres for Node online?

2023-12-07 Thread Matthias Leopold
Hi, I want to change Gres definition for a Node from NodeName=s0-n10 Gres=gpu:a100:5 to NodeName=s0-n10 Gres=gpu:a100-sxm4-80gb:5 -> HW stays the same, only Gres name changes, a100-sxm4-80gb is already defined in Cluster When I do this online will this affect running jobs on the Node? Slur

[slurm-users] slurmdbd 17.02: "cluster not registered" (but things work)

2024-02-19 Thread Matthias Leopold via slurm-users
Hi, I need to take care of a 17.02 Slurm cluster (I'm preparing it for upgrades). I see that slurmdbd logs various "cluster not registered" messages at startup (DBD_CLUSTER_TRES,DBD_JOB_START,DBD_STEP_START), but I don't see a real problem. Accounting works. Do I have to worry? Can this be re

[slurm-users] Slurm and NVIDIA NVML

2024-11-13 Thread Matthias Leopold via slurm-users
Hi, I'm trying to compile Slurm with NVIDIA NVML support, but the result is unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would expect).

[slurm-users] Re: [EXTERN] Re: Slurm and NVIDIA NVML

2024-11-13 Thread Matthias Leopold via slurm-users
@altoslabs.com> On Wed, Nov 13, 2024 at 10:21 AM Matthias Leopold via slurm-users mailto:slurm-users@lists.schedmd.com>> wrote: Hi, I'm trying to compile Slurm with NVIDIA NVML support, but the result is unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so,

[slurm-users] Slurm PID Files

2024-11-20 Thread Matthias Leopold via slurm-users
Hi, I compiled and installed Slurm 24.05 on Ubuntu 22.04 following this tutorial: https://www.schedmd.com/slurm/installation-tutorial/ Systemd service files are from deb packages that result from this. Do I have to worry that slurmctld and slurmd don't write PID files although SlurmctldPidFil

[slurm-users] Slurm 24.05 and OpenMPI

2025-04-04 Thread Matthias Leopold via slurm-users
Hi, I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA deepops framework a couple of years ago. It is based on Ubuntu 20.04 and makes use of the NVIDIA pyxis/enroot container solution. For operational validation I used the nccl-tests application in a container. nccl-tests

[slurm-users] Re: [EXTERNAL] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-27 Thread Matthias Leopold via slurm-users
: *Davide DelVento *Date: *Thursday, March 27, 2025 at 7:41 AM *To: *Matthias Leopold *Cc: *Slurm User Community List *Subject: *[EXTERNAL] [slurm-users] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI Hi Matthias, I see. It does not freak me out. Unfortunately I have very little experience working wit

[slurm-users] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-27 Thread Matthias Leopold via slurm-users
ver-else-you-need" (which obviously may or may not be relevant for your case). Cheers, Davide On Wed, Mar 26, 2025 at 12:51 PM Matthias Leopold via slurm-users mailto:slurm-users@lists.schedmd.com>> wrote: Hi, I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and N

[slurm-users] Re: [EXTERNAL] Re: [EXTERN] Re: Slurm 24.05 and OpenMPI

2025-03-28 Thread Matthias Leopold via slurm-users
seeing the message you have in you original post? Howard On 3/27/25, 9:20 AM, "Matthias Leopold" mailto:matthias.leop...@meduniwien.ac.at>> wrote: Hi Howard, thanks, but my Slurm 24.05 definitely has pmix support (visible in "srun –mpi=list") and it uses it through "

[slurm-users] Re: [EXTERN] Slurm upgrade using Debian packages

2025-03-09 Thread Matthias Leopold via slurm-users
Thanks for all replies. I'll take the hints with running slurmctld/slurmdbd on separate nodes and disabling systemd units when upgrading (I thought of that) with me. Matthias Am 06.03.25 um 17:04 schrieb Matthias Leopold via slurm-users: Hi, I'm building Slurm Debian packages fr

[slurm-users] Slurm upgrade using Debian packages

2025-03-06 Thread Matthias Leopold via slurm-users
Hi, I'm building Slurm Debian packages from SchedMD sources using this tutorial https://www.schedmd.com/slurm/installation-tutorial/. Now I tried upgrading (minor release upgrade within 24.05) using these packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade (a) slurmdbd (b) sl