This problem turned out to be that the new node was on a different subnet than
the other nodes. Once our network admin opened up ports 6817, 6818, and 6188
between the subnets the new node worked.
Thanks for all the responses.
From: slurm-users On Behalf Of Riebs,
Andy
Sent: Friday, Ap
Even for users other than slurm and munge? It seems strange that 3 of 4 worker
nodes work with the same UIDs/GIDs as the non-working nodes.
-Original Message-
From: slurm-users On Behalf Of
Christopher Samuel
Sent: Wednesday, April 22, 2020 2:27 PM
To: slurm-users@lists.schedmd.com
Sub
There is a third user account on all machines in the cluster that is the
user account for using the cluster. That account has uid 1000 on all four
worker nodes, but on the controller it is 1001. So that is probably why the
question marks.
I doubt this is the issue when 3 of the 4 nodes that work
The uid and gid are the same for the slurm and munge users on each node. The
two new nodes, one of which can’t connect with the controller, have the same
users and were created with the same sequence of steps. The only exception is
that the node that won’t connect has the software stack to com
I believe in order to compile for nvml you'll have to compile on a system with
an Nvidia gpu installed otherwise the Nvidia driver and libraries won't install
on that system.
-Original Message-
From: slurm-users On Behalf Of
Christopher Samuel
Sent: Tuesday, April 7, 2020 10:08 PM
To:
There are GPU plugins that won't be built unless you build on a node that has
the Nvidia drivers installed.
-Original Message-
From: slurm-users On Behalf Of Brian
Andrus
Sent: Friday, February 28, 2020 7:36 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Hybrid compiling op
The firewalls are disabled on all nodes on my cluster so I don't think it is a
firewall issue. It's probably our network security between the wired part of
our network and the wireless side. When I put the nodes back on a wired
controller they work again.
-Original Message-
From: slu
I just checked the .deb package that I build from source and there is nothing
in it that has nv or cuda in its name.
Are you sure that slurm distributes nvidia binaries?
-Original Message-
From: slurm-users On Behalf Of Stephan
Roth
Sent: Friday, February 7, 2020 2:23 AM
To: slurm-user
I didn't know that slurm had nvml linked into it. I build slurm from source
and didn't notice that nvml was part of the build. I'll check on that again.
-Original Message-
From: slurm-users On Behalf Of Stephan
Roth
Sent: Friday, February 7, 2020 2:23 AM
To: slurm-users@lists.schedmd
This started working for me this morning. I have no idea why it started to
work. Maybe it was multiple restarts of the various daemons that did it.
-Original Message-
From: slurm-users On Behalf Of Brian W.
Johanson
Sent: Tuesday, February 4, 2020 1:35 PM
To: slurm-users@lists.sche
I've already restarted slurmctld and slurmd on all nodes. Still get the same
problem.
-Original Message-
From: slurm-users On Behalf Of Marcus
Wagner
Sent: Tuesday, February 4, 2020 2:31 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] sbatch script won't accept --gres t
If you look at the languages supported for server (endpoint) generation there
isn't an implementation in C or even C++. The client code generator has a
cpprest implementation (C++).
They may have included swagger in those slides just to help people understand
what a REST endpoint is.
For my p
So that would be a top-down approach where they define the API first using
OpenAPI and then generate the code from that. I was planning on just writing
the REST client code manually, but I could try putting the REST API I'll be
calling into the swagger-codegen page and see what kind of client s
Thanks for that.
Slurmrestd will be a REST endpoint. I need a C library for REST clients that
will call REST endpoints. Maybe the library that the slurm team is using for
slurmrestd will support both endpoints and clients.
I'm working on the 19.05.4 source code since it is stable, but I would
Is that logged somewhere or do I need to capture the output from the make
command to a file?
-Original Message-
From: slurm-users On Behalf Of Kurt
H Maier
Sent: Wednesday, December 11, 2019 6:32 PM
To: Slurm User Community List
Subject: Re: [slurm-users] Need help with controller issues
15 matches
Mail list logo