I've already restarted slurmctld and slurmd on all nodes. Still get the same problem.
-----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Marcus Wagner Sent: Tuesday, February 4, 2020 2:31 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sbatch script won't accept --gres that requires more than 1 gpu Hi Dean, could you please try to restart the slurmctld? This usually helps on our site. Never saw this with gres happening, but many other times. This is, why we restart slurmctld once a day by a cron job. Best Marcus On 2/4/20 12:59 AM, Dean Schulze wrote: > When I run an sbatch script with the line > > #SBATCH --gres=gpu:gp100:1 > > it runs. When I change it to > > #SBATCH --gres=gpu:gp100:3 > > it fails with "Requested node configuration is not available". But I > have a node with 4 gp100s available. Here's my slurm.conf: > > NodeName=liqidos-dean-node1 CPUs=2 Boards=1 SocketsPerBoard=2 > CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770 Gres=gpu:gp100:4 > > That node has a gres.conf with these lines: > > Name=gpu Type=gp100 File=/dev/nvidia0 Name=gpu Type=gp100 > File=/dev/nvidia1 Name=gpu Type=gp100 File=/dev/nvidia2 Name=gpu > Type=gp100 File=/dev/nvidia3 > > The character devices all exist in /dev. > > What's the controller complaining about? -- Marcus Wagner, Dipl.-Inf. IT Center Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de