[slurm-users] creating a reservation for a gres resources e.g. GPU?

2020-04-22 Thread Alastair Neil
Hi there, Slurm version 18.08 I am trying to find out if there is a way to add a specific gres, in this case a GPU to a reservation? I think I can reserve a portion of a node that has a specific gres quantity attached but I cannot figure out how to reserve the gres, so I cannot guarantee that i

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
The uid and gid are the same for the slurm and munge users on each node. The two new nodes, one of which can’t connect with the controller, have the same users and were created with the same sequence of steps. The only exception is that the node that won’t connect has the software stack to com

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
There is a third user account on all machines in the cluster that is the user account for using the cluster. That account has uid 1000 on all four worker nodes, but on the controller it is 1001. So that is probably why the question marks. I doubt this is the issue when 3 of the 4 nodes that work

[slurm-users] One node won't connect and false positive messages from slurm every 1 minute 40 seconds

2020-04-22 Thread Dean Schulze
I added two new nodes to my cluster (5 nodes total including controller). One of the new nodes works, but the other one can't connect to the controller. Both new nodes were created the same way except that the one that can't connect to the controller has some extra packages installed to build slur

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread Christopher Samuel
On 4/22/20 12:56 PM, dean.w.schu...@gmail.com wrote: There is a third user account on all machines in the cluster that is the user account for using the cluster. That account has uid 1000 on all four worker nodes, but on the controller it is 1001. So that is probably why the question marks.

[slurm-users] floating condo partition, , no pre-emption, guarantee a max pend time?

2020-04-22 Thread Paul Brunk
Hi all: [ BTW this is the same situation that the submitter of https://bugs.schedmd.com/show_bug.cgi?id=2692 presented. ] We have a non-Slurm cluster in production and are developing our next one, which will run Slurm 20.02.X. We have a partition "batch" which is open to all users. Half of th

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread dean.w.schulze
Even for users other than slurm and munge? It seems strange that 3 of 4 worker nodes work with the same UIDs/GIDs as the non-working nodes. -Original Message- From: slurm-users On Behalf Of Christopher Samuel Sent: Wednesday, April 22, 2020 2:27 PM To: slurm-users@lists.schedmd.com Sub

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-22 Thread Ole Holm Nielsen
Hi Michael, Thanks for your insightful explanation of the Slurm RPM build process! This clarified the topic a lot for me. I have updated my Slurm installation Wiki page based upon your information: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms /Ole On 21-04-2020 22

Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread Gennaro Oliva
Hi Dean, On Wed, Apr 22, 2020 at 07:28:15PM -0600, dean.w.schu...@gmail.com wrote: > Even for users other than slurm and munge? It seems strange that 3 of > 4 worker nodes work with the same UIDs/GIDs as the non-working nodes. As in: https://slurm.schedmd.com/quickstart_admin.html Super Quick