Hi,
I compiled and installed Slurm 24.05 on Ubuntu 22.04 following this
tutorial: https://www.schedmd.com/slurm/installation-tutorial/
Systemd service files are from deb packages that result from this.
Do I have to worry that slurmctld and slurmd don't write PID files
although SlurmctldPidFil
@altoslabs.com>
On Wed, Nov 13, 2024 at 10:21 AM Matthias Leopold via slurm-users
mailto:slurm-users@lists.schedmd.com>>
wrote:
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so,
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when
I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no
reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would
expect).
Hi,
I need to take care of a 17.02 Slurm cluster (I'm preparing it for
upgrades). I see that slurmdbd logs various "cluster not registered"
messages at startup (DBD_CLUSTER_TRES,DBD_JOB_START,DBD_STEP_START), but
I don't see a real problem. Accounting works. Do I have to worry? Can
this be re
Hi,
I want to change Gres definition for a Node
from
NodeName=s0-n10 Gres=gpu:a100:5
to
NodeName=s0-n10 Gres=gpu:a100-sxm4-80gb:5
-> HW stays the same, only Gres name changes, a100-sxm4-80gb is already
defined in Cluster
When I do this online will this affect running jobs on the Node?
Slur
Hi,
not sure if this is the right place:
Our Slurm 21.08 is compiled against NVML from CUDA 11.4 for
"AutoDetect=nvml" support in gres.conf. Currently we use A100 GPU, I
would like to know if we could use H100 GPU with this setup or if I need
newer NVML (what version?). I didn't find anything
On 05/07/2023 17:17, Matthias Leopold wrote:
Thanks, but unfortunately that didn't help.
Regards,
Matthias
Am 05.07.23 um 17:59 schrieb Marko Markoc:
Hi Matthias,
Before you start digging deeper into this, I would recommend
restarting the `slurmctld` service. I've had simila
27;t enough for certain configuration changes.
Regards,
Marko
On Tue, Jul 4, 2023 at 3:57 AM Matthias Leopold
<mailto:matthias.leop...@meduniwien.ac.at>> wrote:
Hi,
I'm trying to use AllowGroups for partition configuration in my Slurm
21.08 cluster. Unexpectedly this
Hi,
I'm trying to use AllowGroups for partition configuration in my Slurm
21.08 cluster. Unexpectedly this doesn't seem to work. My user can't
submit jobs although he is member of group mentioned in AllowGroups:
srun: error: Unable to allocate resources: User's group not permitted to
use thi
|The Rachel and Selim Benin School
[] /\ |of Computer Science and Engineering
[]//\\/ |The Hebrew University of Jerusalem
[// \\ |T +972-2-5494522 | F +972-2-5494522
// \ |ir...@cs.huji.ac.il <mailto:ir...@cs.huji.ac.il>
// |
--
Matthias Leopold
Hi,
I want to access the kernel "user" keyrings inside a Slurm job on a
Ubuntu 20.04 node. I'm not an expert on keyrings (yet), I just
discovered that inside a Slurm job a keyring for "user: invocation_id"
is used, which seems to be shared across all users of the executing
Slurm node (other u
Hi,
I know this might be a too simple question for a bigger topic, but I'll
just try: is there something like seff for measuring the efficiency of
NVIDIA GPU usage in Slurm jobs?
thx
Matthias
ives me
everything I want, sorry for bothering you.
Matthias
Am 27.01.22 um 16:27 schrieb Matthias Leopold:
Hi,
we have 2 DGX A100 systems which we would like to use with Slurm. We
want to use the MIG feature for _some_ of the GPUs. As I somehow
suspected I couldn't find a working setup
devices. But there are downsides like no multi node MPI
jobs and in general I still can't believe there is such a limitation.
thx again for any feedback
Matthias
Am 27.01.22 um 16:27 schrieb Matthias Leopold:
Hi,
we have 2 DGX A100 systems which we would like to use with Slurm. We
want t
Hi,
we have 2 DGX A100 systems which we would like to use with Slurm. We
want to use the MIG feature for _some_ of the GPUs. As I somehow
suspected I couldn't find a working setup for this in Slurm yet. I'll
describe the configuration variants I tried after creating the MIG
instances, it migh
Am 12.01.22 um 17:54 schrieb Matthias Leopold:
Hi,
I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework
(https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How
can I tell if UCX is actually included in the resulting binaries
(without actu
Hi,
I'm compiling Slurm with ansible playbooks from NVIDIA deepops framework
(https://github.com/NVIDIA/deepops). I'm trying to add UCX support. How
can I tell if UCX is actually included in the resulting binaries
(without actually using Slurm)? I was looking at executables and *so
files with
Hi,
I'm trying to prepare for using Slurm with DGX A100 systems with MIG
configuration. I will have several gres:gpu types there so I tried to
reproduce the situation described in "Specific limits over GRES" from
https://slurm.schedmd.com/resource_limits.html, but I can't.
In my test environ
at is expected behavior,
but it would keep you from having to do something with a plugin.
Jeff
*From:* slurm-users on behalf of
Matthias Leopold
*Sent:* Thursday, April 22, 2021 5:13 AM
*To:* Slurm User Community List
*Su
Hi,
I'm testing how limiting memory resources works in Slurm.
I'm using TaskPlugin=affinity,cgroup (slurm.conf) and
ConstrainRAMSpace=yes (cgroup.conf) and have set a MaxMemPerCPU limit on
the partition.
To my surprise MaxMemPerCPU is enforced as long as the job submission
requests a memory li
Hi,
can someone please explain to me why it's possible to set Grp* resource
limits on user associations? What's the use for this? As far as I
understood documentation accounts can have children, but not users.
I'm still a newbie exploring Slurm in a test environment, please excuse
maybe stup
I had
to do it and had no hints)
Sorry for bothering you
Matthias
Am 06.04.21 um 17:06 schrieb Matthias Leopold:
Hi,
I'm very new to Slurm and try to understand basic concepts. One of them
is the "Multifactor Priority Plugin". For this I submitted some jobs and
looked at ss
Hi,
I'm very new to Slurm and try to understand basic concepts. One of them
is the "Multifactor Priority Plugin". For this I submitted some jobs and
looked at sshare output. To my surprise I don't get any numbers for
"RawUsage", regardless what I do RawUsage stays 0 (same in "scontrol
show as
23 matches
Mail list logo