Hi Zhang Thanks for the quick reply.
Could you please guide me on specifying MIG partitions in gres.conf and in slurm.conf My MIG is as below: root@rl-dgxs-r21-l2:~# sudo nvidia-smi mig -lgi +----------------------------------------------------------------+ | GPU instances: | | GPU Name Profile Instance Placement | | ID ID Start:Size | |========================================| | 0 MIG 1g.10gb 19 9 2:1 | +----------------------------------------------------------------+ | 0 MIG 1g.10gb 19 10 3:1 | +----------------------------------------------------------------+ | 0 MIG 2g.20gb 14 3 0:2 | +----------------------------------------------------------------+ | 0 MIG 3g.40gb 9 2 4:4 | +----------------------------------------------------------------+ root@rl-dgxs-r21-l2:~# nvidia-smi -L GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-a044d304-28b2-c3f1-42ea-b9440d868231) MIG 3g.40gb Device 0: (UUID: MIG-d4514f04-e287-50e9-b3c4-c19fddbb9aa2) MIG 2g.20gb Device 1: (UUID: MIG-4f393220-5308-51f7-bd7a-322306593545) MIG 1g.10gb Device 2: (UUID: MIG-4d988c3e-160a-52f3-a3e1-8eeccfee4585) MIG 1g.10gb Device 3: (UUID: MIG-4ff411c0-c0e2-5b86-a3a4-e76a6b6491cb) GPU 1: NVIDIA A100-SXM4-80GB (UUID: GPU-6aecec97-f63e-4815-3c20-503c4e82fa57) GPU 2: NVIDIA A100-SXM4-80GB (UUID: GPU-2ab474df-49c4-531a-6580-cd44d9982d0a) GPU 3: NVIDIA DGX Display (UUID: GPU-aa49c52b-640c-39b2-1cee-a0120d0b5fa7) GPU 4: NVIDIA A100-SXM4-80GB (UUID: GPU-c47adbbe-7ef8-246c-f700-010542e60ad0) Any suggestions on using MIG partitions in my slurm jobs? With Warm Regards Ravi From: Shunran Zhang Sent: Thursday, November 30, 2023 9:50 PM To: Ravi Konila ; Slurm User Community List Subject: Re: [slurm-users] Autodetect of nvml is not working in Hi Ravi Unfortunately if the NVML flag is off on compile time ( when the maintainer build the apt package for you to install ), that part of code would not be in your binary code. Recompile yourself following the official documentation or find some repository that builds slurm with NVML are your only options. Good luck S. Zhang Ravi Konila <ravibh...@gmail.com>於2023年12月1日 00:51寫道: Hi Josef and Rob Thanks for the reply. I do agree cuda-nvml-devel was not there while installing slurm-llnl in Ubuntu 22.04. Later I installed it. I did not build slurm but I installed it from apt install slurm command. Is there any method to use it post slurm installation? With Warm Regards Ravi K. Ph: +91-9901072688