Below is worked for cpu, with  OverSubscribe, I can have more than 4 process in 
running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will 
be just 1 process in running status, the other are in pending status.
The OverSubscribe can just be used for the resource cpu, whether it can be used 
for gpu?


slurm.conf
# COMPUTE NODES
#DefMemPerCPU=100
NodeName=localhost Feature=gpu_shared Gres=gpu:2 CPUs=4 RealMemory=5000 
State=UNKNOWN
PartitionName=compute Nodes=localhost OverSubscribe=YES Default=YES 
DefMemPerCPU=1000 MaxTime=INFINITE State=UP

Job file
#SBATCH --job-name cifar10
#SBATCH --partition compute
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH -C gpu_shared
#SBATCH --oversubscribe
env
sleep 100



Reply via email to