I think all you’re looking for is Generic Resource (GRES) scheduling, starting 
at https://slurm.schedmd.com/gres.html — if you’ve already seen that, then more 
details would be helpful.

If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run up to 
4 of those jobs and leave the rest pending.

-- 
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601     / Tennessee Tech University

> On Mar 20, 2019, at 6:05 PM, Nicholas Yue <yue.nicho...@gmail.com> wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hi,
> 
>   I am new to SLURM.
> 
>   I have access to a cluster where one of the node has 4 GPUs
> 
>   We are running version SLURM 17.11.12
> 
>   Is there some SBATCH token=value pair value I can use to submit jobs (each 
> of which has an application that is only able to utilize 1 GPU) so that if I 
> submit 6 copies, 4 copies will be dispatched and the 2 remaining will be in a 
> state e.g. PD, until a GPU frees up
> 
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 396.44                 Driver Version: 396.44                    
> |
> |-------------------------------+----------------------+----------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC 
> |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. 
> |
> |===============================+======================+======================|
> |   0  Tesla P100-PCIE...  On   | 00000000:25:00.0 Off |                    0 
> |
> | N/A   29C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default 
> |
> +-------------------------------+----------------------+----------------------+
> |   1  Tesla P100-PCIE...  On   | 00000000:59:00.0 Off |                    0 
> |
> | N/A   26C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default 
> |
> +-------------------------------+----------------------+----------------------+
> |   2  Tesla P100-PCIE...  On   | 00000000:6D:00.0 Off |                    0 
> |
> | N/A   27C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default 
> |
> +-------------------------------+----------------------+----------------------+
> |   3  Tesla P100-PCIE...  On   | 00000000:99:00.0 Off |                    0 
> |
> | N/A   31C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default 
> |
> +-------------------------------+----------------------+----------------------+
> 
> 
> Cheers
> -- 
> Nicholas Yue
> Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
> Custom Dev - C++ porting, OSX, Linux, Windows
> http://au.linkedin.com/in/nicholasyue
> https://vimeo.com/channels/naiadtools

Reply via email to