Is that a lowercase k in k20 specified in the batch script and nodename and a uppercase K specified in gres.conf?

On 12/03/2018 09:13 AM, Lou Nicotra wrote:
Hi All, I have recently set up a slurm cluster with my servers and I'm running into an issue while submitting GPU jobs. It has something to to with gres configurations, but I just can't seem to figure out what is wrong. Non GPU jobs run fine.

The error is as follows:
sbatch: error: Batch job submission failed: Invalid Trackable RESource (TRES) specification after submitting a batch job.

My batch job is as follows:
#!/bin/bash
#SBATCH --partition=tiger_1   # partition name
#SBATCH --gres=gpu:k20:1
#SBATCH --gres-flags=enforce-binding
#SBATCH --time=0:20:00  # wall clock limit
#SBATCH --output=gpu-%J.txt
#SBATCH --account=lnicotra
module load cuda
python gpu1

Where gpu1 is a GPU test script that runs correctly while invoked via python. Tiger_1 partition has servers with GPUs, with a mix of 1080GTX and K20 as specified in slurm.conf

I have defined GRES resources in the slurm.conf file:
# GPU GRES
GresTypes=gpu
NodeName=tiger[01,05,10,15,20] Gres=gpu:1080gtx:2
NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Gres=gpu:k20:2

And have a local gres.conf on the servers containing GPUs...
lnicotra@tiger11 ~# cat /etc/slurm/gres.conf
# GPU Definitions
# NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=K20 File=/dev/nvidia[0-1]
Name=gpu Type=K20 File=/dev/nvidia[0-1] Cores=0,1

and a similar one for the 1080GTX
# GPU Definitions
# NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080GTX File=/dev/nvidia[0-1]
Name=gpu Type=1080GTX File=/dev/nvidia[0-1] Cores=0,1

The account manager seems to know about the GPUs...
lnicotra@tiger11 ~# sacctmgr show tres
    Type            Name     ID
-------- --------------- ------
     cpu                      1
     mem                      2
  energy                      3
    node                      4
 billing                      5
      fs            disk      6
    vmem                      7
   pages                      8
    gres             gpu   1001
    gres         gpu:k20   1002
    gres     gpu:1080gtx   1003

Can anyone point out what am I missing?

Thanks!
Lou


--

*Lou Nicotra*

IT Systems Engineer - SLT

Interactions LLC

o: 908-673-1833 <tel:781-405-5114>

m: 908-451-6983 <tel:781-405-5114>

_lnico...@interactions.com <mailto:lnico...@interactions.com>_

www.interactions.com <http://www.interactions.com/>

*******************************************************************************

This e-mail and any of its attachments may contain Interactions LLC proprietary information, which is privileged, confidential, or subject to copyright belonging to the Interactions LLC. This e-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this e-mail is strictly prohibited and may be unlawful. If you have received this e-mail in error, please notify the sender immediately and permanently delete the original and any copy of this e-mail and any printout. Thank You.

*******************************************************************************


Reply via email to