Re: [slurm-users] Munge decode failing on new node

2020-05-14 Thread dean.w.schulze
This problem turned out to be that the new node was on a different subnet than the other nodes. Once our network admin opened up ports 6817, 6818, and 6188 between the subnets the new node worked. Thanks for all the responses. From: slurm-users On Behalf Of Riebs, Andy Sent: Friday, Ap

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
Even for users other than slurm and munge? It seems strange that 3 of 4 worker nodes work with the same UIDs/GIDs as the non-working nodes. -Original Message- From: slurm-users On Behalf Of Christopher Samuel Sent: Wednesday, April 22, 2020 2:27 PM To: slurm-users@lists.schedmd.com Sub

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
There is a third user account on all machines in the cluster that is the user account for using the cluster. That account has uid 1000 on all four worker nodes, but on the controller it is 1001. So that is probably why the question marks. I doubt this is the issue when 3 of the 4 nodes that work

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
The uid and gid are the same for the slurm and munge users on each node. The two new nodes, one of which can’t connect with the controller, have the same users and were created with the same sequence of steps. The only exception is that the node that won’t connect has the software stack to com

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread dean.w.schulze
I believe in order to compile for nvml you'll have to compile on a system with an Nvidia gpu installed otherwise the Nvidia driver and libraries won't install on that system. -Original Message- From: slurm-users On Behalf Of Christopher Samuel Sent: Tuesday, April 7, 2020 10:08 PM To:

Re: [slurm-users] Hybrid compiling options

2020-02-29 Thread dean.w.schulze
There are GPU plugins that won't be built unless you build on a node that has the Nvidia drivers installed. -Original Message- From: slurm-users On Behalf Of Brian Andrus Sent: Friday, February 28, 2020 7:36 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Hybrid compiling op

Re: [slurm-users] Which ports does slurm use?

2020-02-07 Thread dean.w.schulze
The firewalls are disabled on all nodes on my cluster so I don't think it is a firewall issue. It's probably our network security between the wired part of our network and the wireless side. When I put the nodes back on a wired controller they work again. -Original Message- From: slu

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread dean.w.schulze
I just checked the .deb package that I build from source and there is nothing in it that has nv or cuda in its name. Are you sure that slurm distributes nvidia binaries? -Original Message- From: slurm-users On Behalf Of Stephan Roth Sent: Friday, February 7, 2020 2:23 AM To: slurm-user

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread dean.w.schulze
I didn't know that slurm had nvml linked into it. I build slurm from source and didn't notice that nvml was part of the build. I'll check on that again. -Original Message- From: slurm-users On Behalf Of Stephan Roth Sent: Friday, February 7, 2020 2:23 AM To: slurm-users@lists.schedmd

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-04 Thread dean.w.schulze
This started working for me this morning. I have no idea why it started to work. Maybe it was multiple restarts of the various daemons that did it. -Original Message- From: slurm-users On Behalf Of Brian W. Johanson Sent: Tuesday, February 4, 2020 1:35 PM To: slurm-users@lists.sche

Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-04 Thread dean.w.schulze
I've already restarted slurmctld and slurmd on all nodes. Still get the same problem. -Original Message- From: slurm-users On Behalf Of Marcus Wagner Sent: Tuesday, February 4, 2020 2:31 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sbatch script won't accept --gres t

Re: [slurm-users] Question about slurm source code and libraries

2020-01-27 Thread dean.w.schulze
If you look at the languages supported for server (endpoint) generation there isn't an implementation in C or even C++. The client code generator has a cpprest implementation (C++). They may have included swagger in those slides just to help people understand what a REST endpoint is. For my p

Re: [slurm-users] Question about slurm source code and libraries

2020-01-25 Thread dean.w.schulze
So that would be a top-down approach where they define the API first using OpenAPI and then generate the code from that. I was planning on just writing the REST client code manually, but I could try putting the REST API I'll be calling into the swagger-codegen page and see what kind of client s

Re: [slurm-users] Question about slurm source code and libraries

2020-01-25 Thread dean.w.schulze
Thanks for that. Slurmrestd will be a REST endpoint. I need a C library for REST clients that will call REST endpoints. Maybe the library that the slurm team is using for slurmrestd will support both endpoints and clients. I'm working on the 19.05.4 source code since it is stable, but I would

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread dean.w.schulze
Is that logged somewhere or do I need to capture the output from the make command to a file? -Original Message- From: slurm-users On Behalf Of Kurt H Maier Sent: Wednesday, December 11, 2019 6:32 PM To: Slurm User Community List Subject: Re: [slurm-users] Need help with controller issues