Thanks Michael. I noticed a couple of questions on the mailing list mentioning GRES lately. I will share that information to our SLURM administrators.
Cheers On Thu, 21 Mar 2019 at 12:56, Renfro, Michael <ren...@tntech.edu> wrote: > I think all you’re looking for is Generic Resource (GRES) scheduling, > starting at https://slurm.schedmd.com/gres.html — if you’ve already seen > that, then more details would be helpful. > > If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run > up to 4 of those jobs and leave the rest pending. > > -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > 931 372-3601 / Tennessee Tech University > > > On Mar 20, 2019, at 6:05 PM, Nicholas Yue <yue.nicho...@gmail.com> > wrote: > > > > External Email Warning > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > Hi, > > > > I am new to SLURM. > > > > I have access to a cluster where one of the node has 4 GPUs > > > > We are running version SLURM 17.11.12 > > > > Is there some SBATCH token=value pair value I can use to submit jobs > (each of which has an application that is only able to utilize 1 GPU) so > that if I submit 6 copies, 4 copies will be dispatched and the 2 remaining > will be in a state e.g. PD, until a GPU frees up > > > > > +-----------------------------------------------------------------------------+ > > | NVIDIA-SMI 396.44 Driver Version: 396.44 > | > > > |-------------------------------+----------------------+----------------------+ > > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile > Uncorr. ECC | > > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util > Compute M. | > > > |===============================+======================+======================| > > | 0 Tesla P100-PCIE... On | 00000000:25:00.0 Off | > 0 | > > | N/A 29C P0 26W / 250W | 0MiB / 16280MiB | 0% > Default | > > > +-------------------------------+----------------------+----------------------+ > > | 1 Tesla P100-PCIE... On | 00000000:59:00.0 Off | > 0 | > > | N/A 26C P0 26W / 250W | 0MiB / 16280MiB | 0% > Default | > > > +-------------------------------+----------------------+----------------------+ > > | 2 Tesla P100-PCIE... On | 00000000:6D:00.0 Off | > 0 | > > | N/A 27C P0 26W / 250W | 0MiB / 16280MiB | 0% > Default | > > > +-------------------------------+----------------------+----------------------+ > > | 3 Tesla P100-PCIE... On | 00000000:99:00.0 Off | > 0 | > > | N/A 31C P0 26W / 250W | 0MiB / 16280MiB | 0% > Default | > > > +-------------------------------+----------------------+----------------------+ > > > > > > Cheers > > -- > > Nicholas Yue > > Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5 > > Custom Dev - C++ porting, OSX, Linux, Windows > > http://au.linkedin.com/in/nicholasyue > > https://vimeo.com/channels/naiadtools > > -- Nicholas Yue Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5 Custom Dev - C++ porting, OSX, Linux, Windows http://au.linkedin.com/in/nicholasyue https://vimeo.com/channels/naiadtools