I have 4 gres gpus called foolsgold that I am trying to allocate, 1-to-a-job. 
But allocating 1 gpu allocates all gpus to that job, it seems. My batch script 
is:
#!/bin/bash
#SBATCH --partition=scavenge
#SBATCH --qos=scavenge
#SBATCH --account=borrowed
#SBATCH --nodes=1
#SBATCH --tasks=1
#SBATCH --time=00:05:20
#SBATCH --gpus=foolsgold:1
date
hostname -s
for ((i=1;i<=1000000000;i++)) ; do a=$((i++)) ; done
date

And the partition definition is:
PartitionName=scavtres Nodes=saga-test01,saga-test02 MaxTime=72:00:00 State=UP 
PriorityTier=0 PreemptMode=REQUEUE AllowQos=scavenge 
AllowAccounts=borrowed,gaia default=yes 
TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/foolsgold=200.0" OverSubscribe=FORCE

I have 2 compute nodes in this test cluster, each one with 4 gpus defined:
    NodeName=saga-test01 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 
ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN Gres=gpu:foolsgold:4
    NodeName=saga-test02 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 
ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN Gres=gpu:foolsgold:4

The /etc/slurm/gres.conf on the two compute nodes:
Name=gpu Type=foolsgold File=/tmp/fg0
Name=gpu Type=foolsgold File=/tmp/fg1
Name=gpu Type=foolsgold File=/tmp/fg2
Name=gpu Type=foolsgold File=/tmp/fg3

How can I get one gpu allocated per job?

Thanks,

Erik

Reply via email to