Hi Zhang

Thanks for the quick reply. 

Could you please guide me on specifying MIG partitions in gres.conf and in 
slurm.conf

My MIG is as below:

root@rl-dgxs-r21-l2:~# sudo nvidia-smi mig -lgi
+----------------------------------------------------------------+
| GPU instances:                                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID                      Start:Size    |
|========================================|
|   0  MIG 1g.10gb         19        9              2:1          |
+----------------------------------------------------------------+
|   0  MIG 1g.10gb         19       10              3:1         |
+----------------------------------------------------------------+
|   0  MIG 2g.20gb         14        3              0:2          |
+----------------------------------------------------------------+
|   0  MIG 3g.40gb          9        2              4:4           |
+----------------------------------------------------------------+

root@rl-dgxs-r21-l2:~# nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-a044d304-28b2-c3f1-42ea-b9440d868231)
  MIG 3g.40gb     Device  0: (UUID: MIG-d4514f04-e287-50e9-b3c4-c19fddbb9aa2)
  MIG 2g.20gb     Device  1: (UUID: MIG-4f393220-5308-51f7-bd7a-322306593545)
  MIG 1g.10gb     Device  2: (UUID: MIG-4d988c3e-160a-52f3-a3e1-8eeccfee4585)
  MIG 1g.10gb     Device  3: (UUID: MIG-4ff411c0-c0e2-5b86-a3a4-e76a6b6491cb)
GPU 1: NVIDIA A100-SXM4-80GB (UUID: GPU-6aecec97-f63e-4815-3c20-503c4e82fa57)
GPU 2: NVIDIA A100-SXM4-80GB (UUID: GPU-2ab474df-49c4-531a-6580-cd44d9982d0a)
GPU 3: NVIDIA DGX Display (UUID: GPU-aa49c52b-640c-39b2-1cee-a0120d0b5fa7)
GPU 4: NVIDIA A100-SXM4-80GB (UUID: GPU-c47adbbe-7ef8-246c-f700-010542e60ad0)

Any suggestions on using MIG partitions in my slurm jobs?

With Warm Regards
Ravi 



From: Shunran Zhang 
Sent: Thursday, November 30, 2023 9:50 PM
To: Ravi Konila ; Slurm User Community List 
Subject: Re: [slurm-users] Autodetect of nvml is not working in

Hi Ravi 

Unfortunately if the NVML flag is off on compile time ( when the maintainer 
build the apt package for you to install ), that part of code would not be in 
your binary code. 

Recompile yourself following the official documentation or find some repository 
that builds slurm with NVML are your only options. 

Good luck
S. Zhang


  Ravi Konila <ravibh...@gmail.com>於2023年12月1日 00:51寫道:


   
  Hi Josef and Rob
  Thanks for the reply.
  I do agree cuda-nvml-devel was not there while installing slurm-llnl in 
Ubuntu 22.04. 
  Later I installed it. 
  I did not build slurm but I installed it from apt install slurm command. 

  Is there any method to use it post slurm installation?

  With Warm Regards
  Ravi K.
  Ph: +91-9901072688 

Reply via email to