I think all you’re looking for is Generic Resource (GRES) scheduling, starting at https://slurm.schedmd.com/gres.html — if you’ve already seen that, then more details would be helpful.
If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run up to 4 of those jobs and leave the rest pending. -- Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services 931 372-3601 / Tennessee Tech University > On Mar 20, 2019, at 6:05 PM, Nicholas Yue <yue.nicho...@gmail.com> wrote: > > External Email Warning > This email originated from outside the university. Please use caution when > opening attachments, clicking links, or responding to requests. > Hi, > > I am new to SLURM. > > I have access to a cluster where one of the node has 4 GPUs > > We are running version SLURM 17.11.12 > > Is there some SBATCH token=value pair value I can use to submit jobs (each > of which has an application that is only able to utilize 1 GPU) so that if I > submit 6 copies, 4 copies will be dispatched and the 2 remaining will be in a > state e.g. PD, until a GPU frees up > > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 396.44 Driver Version: 396.44 > | > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC > | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. > | > |===============================+======================+======================| > | 0 Tesla P100-PCIE... On | 00000000:25:00.0 Off | 0 > | > | N/A 29C P0 26W / 250W | 0MiB / 16280MiB | 0% Default > | > +-------------------------------+----------------------+----------------------+ > | 1 Tesla P100-PCIE... On | 00000000:59:00.0 Off | 0 > | > | N/A 26C P0 26W / 250W | 0MiB / 16280MiB | 0% Default > | > +-------------------------------+----------------------+----------------------+ > | 2 Tesla P100-PCIE... On | 00000000:6D:00.0 Off | 0 > | > | N/A 27C P0 26W / 250W | 0MiB / 16280MiB | 0% Default > | > +-------------------------------+----------------------+----------------------+ > | 3 Tesla P100-PCIE... On | 00000000:99:00.0 Off | 0 > | > | N/A 31C P0 26W / 250W | 0MiB / 16280MiB | 0% Default > | > +-------------------------------+----------------------+----------------------+ > > > Cheers > -- > Nicholas Yue > Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5 > Custom Dev - C++ porting, OSX, Linux, Windows > http://au.linkedin.com/in/nicholasyue > https://vimeo.com/channels/naiadtools