Thanks Michael. I will try 17.x as I also could not see anything wrong with my settings... Will report back afterwards...
Lou On Tue, Dec 4, 2018 at 9:11 AM Michael Di Domenico <mdidomeni...@gmail.com> wrote: > unfortunately, someone smarter then me will have to help further. I'm > not sure i see anything specifically wrong. The one thing i might try > is backing the software down to a 17.x release series. I recently > tried 18.x and had some issues. I can't say whether it'll be any > different, but you might be exposing an undiagnosed bug in the 18.x > branch > On Mon, Dec 3, 2018 at 4:17 PM Lou Nicotra <lnico...@interactions.com> > wrote: > > > > Made the change in the gres.conf on local server file and restarted > slurmd and slurmctld on master.... Unfortunately same error... > > > > Distributed corrected gres.conf to all k20 servers, restarted slurmd and > slurmdctl... Still has same error... > > > > On Mon, Dec 3, 2018 at 4:04 PM Brian W. Johanson <bjoha...@psc.edu> > wrote: > >> > >> Is that a lowercase k in k20 specified in the batch script and nodename > and a uppercase K specified in gres.conf? > >> > >> On 12/03/2018 09:13 AM, Lou Nicotra wrote: > >> > >> Hi All, I have recently set up a slurm cluster with my servers and I'm > running into an issue while submitting GPU jobs. It has something to to > with gres configurations, but I just can't seem to figure out what is > wrong. Non GPU jobs run fine. > >> > >> The error is as follows: > >> sbatch: error: Batch job submission failed: Invalid Trackable RESource > (TRES) specification after submitting a batch job. > >> > >> My batch job is as follows: > >> #!/bin/bash > >> #SBATCH --partition=tiger_1 # partition name > >> #SBATCH --gres=gpu:k20:1 > >> #SBATCH --gres-flags=enforce-binding > >> #SBATCH --time=0:20:00 # wall clock limit > >> #SBATCH --output=gpu-%J.txt > >> #SBATCH --account=lnicotra > >> module load cuda > >> python gpu1 > >> > >> Where gpu1 is a GPU test script that runs correctly while invoked via > python. Tiger_1 partition has servers with GPUs, with a mix of 1080GTX and > K20 as specified in slurm.conf > >> > >> I have defined GRES resources in the slurm.conf file: > >> # GPU GRES > >> GresTypes=gpu > >> NodeName=tiger[01,05,10,15,20] Gres=gpu:1080gtx:2 > >> NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Gres=gpu:k20:2 > >> > >> And have a local gres.conf on the servers containing GPUs... > >> lnicotra@tiger11 ~# cat /etc/slurm/gres.conf > >> # GPU Definitions > >> # NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=K20 > File=/dev/nvidia[0-1] > >> Name=gpu Type=K20 File=/dev/nvidia[0-1] Cores=0,1 > >> > >> and a similar one for the 1080GTX > >> # GPU Definitions > >> # NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080GTX > File=/dev/nvidia[0-1] > >> Name=gpu Type=1080GTX File=/dev/nvidia[0-1] Cores=0,1 > >> > >> The account manager seems to know about the GPUs... > >> lnicotra@tiger11 ~# sacctmgr show tres > >> Type Name ID > >> -------- --------------- ------ > >> cpu 1 > >> mem 2 > >> energy 3 > >> node 4 > >> billing 5 > >> fs disk 6 > >> vmem 7 > >> pages 8 > >> gres gpu 1001 > >> gres gpu:k20 1002 > >> gres gpu:1080gtx 1003 > >> > >> Can anyone point out what am I missing? > >> > >> Thanks! > >> Lou > >> > >> > >> -- > >> > >> Lou Nicotra > >> > >> IT Systems Engineer - SLT > >> > >> Interactions LLC > >> > >> o: 908-673-1833 > >> > >> m: 908-451-6983 > >> > >> lnico...@interactions.com > >> > >> www.interactions.com > >> > >> > ******************************************************************************* > >> > >> This e-mail and any of its attachments may contain Interactions LLC > proprietary information, which is privileged, confidential, or subject to > copyright belonging to the Interactions LLC. This e-mail is intended solely > for the use of the individual or entity to which it is addressed. If you > are not the intended recipient of this e-mail, you are hereby notified that > any dissemination, distribution, copying, or action taken in relation to > the contents of and attachments to this e-mail is strictly prohibited and > may be unlawful. If you have received this e-mail in error, please notify > the sender immediately and permanently delete the original and any copy of > this e-mail and any printout. Thank You. > >> > >> > ******************************************************************************* > >> > >> > > > > > > -- > > > > Lou Nicotra > > > > IT Systems Engineer - SLT > > > > Interactions LLC > > > > o: 908-673-1833 > > > > m: 908-451-6983 > > > > lnico...@interactions.com > > > > www.interactions.com > > > > > ******************************************************************************* > > > > This e-mail and any of its attachments may contain Interactions LLC > proprietary information, which is privileged, confidential, or subject to > copyright belonging to the Interactions LLC. This e-mail is intended solely > for the use of the individual or entity to which it is addressed. If you > are not the intended recipient of this e-mail, you are hereby notified that > any dissemination, distribution, copying, or action taken in relation to > the contents of and attachments to this e-mail is strictly prohibited and > may be unlawful. If you have received this e-mail in error, please notify > the sender immediately and permanently delete the original and any copy of > this e-mail and any printout. Thank You. > > > > > ******************************************************************************* > > -- *Lou Nicotra* IT Systems Engineer - SLT Interactions LLC o: 908-673-1833 <781-405-5114> m: 908-451-6983 <781-405-5114> *lnico...@interactions.com <lnico...@interactions.com>* www.interactions.com -- ******************************************************************************* This e-mail and any of its attachments may contain Interactions LLC proprietary information, which is privileged, confidential, or subject to copyright belonging to the Interactions LLC. This e-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this e-mail is strictly prohibited and may be unlawful. If you have received this e-mail in error, please notify the sender immediately and permanently delete the original and any copy of this e-mail and any printout. Thank You. *******************************************************************************