Below is worked for cpu, with OverSubscribe, I can have more than 4 process in running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will be just 1 process in running status, the other are in pending status. The OverSubscribe can just be used for the resource cpu, whether it can be used for gpu?
slurm.conf # COMPUTE NODES #DefMemPerCPU=100 NodeName=localhost Feature=gpu_shared Gres=gpu:2 CPUs=4 RealMemory=5000 State=UNKNOWN PartitionName=compute Nodes=localhost OverSubscribe=YES Default=YES DefMemPerCPU=1000 MaxTime=INFINITE State=UP Job file #SBATCH --job-name cifar10 #SBATCH --partition compute #SBATCH --nodes=1 #SBATCH --tasks-per-node=1 #SBATCH -C gpu_shared #SBATCH --oversubscribe env sleep 100
