Re: [slurm-users] Autodetect of nvml is not working in

2023-11-30 Thread Ravi Konila
Hi Zhang Thanks for the quick reply. Could you please guide me on specifying MIG partitions in gres.conf and in slurm.conf My MIG is as below: root@rl-dgxs-r21-l2:~# sudo nvidia-smi mig -lgi ++ | GPU instances:

Re: [slurm-users] Autodetect of nvml is not working in

2023-11-30 Thread Shunran Zhang
Hi Ravi Unfortunately if the NVML flag is off on compile time ( when the maintainer build the apt package for you to install ), that part of code would not be in your binary code. Recompile yourself following the official documentation or find some repository that builds slurm with NVML are

Re: [slurm-users] Autodetect of nvml is not working in

2023-11-30 Thread Ravi Konila
Hi Josef and Rob Thanks for the reply. I do agree cuda-nvml-devel was not there while installing slurm-llnl in Ubuntu 22.04. Later I installed it. I did not build slurm but I installed it from apt install slurm command. Is there any method to use it post slurm installation? With Warm Regards

Re: [slurm-users] Autodetect of nvml is not working in gres.conf

2023-11-30 Thread Shunran Zhang
Hi all, Apologies for writing something misleading in the last mail. I missed your error message. Rob was correct - your slurmd appears not to have the NVML flag on compile time. You need to set up the NVML and turn the --with-nvml flag on when configuring slurm to fix the issue if you are compil

Re: [slurm-users] Autodetect of nvml is not working in gres.conf

2023-11-30 Thread Shunran Zhang
Hi all, If you could offer a little bit more details on your OS and Slurm version that might shed some light. There is an interesting detail about the NVML package if you are using RHEL-like OS. The NVML detection part of the slurm library (/usr/lib64/slurm/gpu_nvml.so) is linked against the /lib

Re: [slurm-users] Autodetect of nvml is not working in gres.conf

2023-11-30 Thread Groner, Rob
Did you have --with-nvml as part of your configuration? Go back to your config.log and verify that it ever said it found nvml.h. If not, then you'll need to make sure you have the right nvidia/cuda packages installed on the host you're building slurm on, and you might have to specify --with-nv

Re: [slurm-users] Autodetect of nvml is not working in gres.conf

2023-11-30 Thread Josef Dvoracek
couldn't be that library "cuda-nvml-devel" was not installed when you were building slurm? cheers josef On 30. 11. 23 15:06, Ravi Konila wrote: Hello, My gres.conf has AutoDetect=nvml when I restart slurmd service I do get *fatal: We were configured to autodetect nvml functionality, but we w

[slurm-users] Autodetect of nvml is not working in gres.conf

2023-11-30 Thread Ravi Konila
Hello, My gres.conf has AutoDetect=nvml when I restart slurmd service I do get fatal: We were configured to autodetect nvml functionality, but we weren't able to find that lib when Slurm was configured. Referred few links to solve along with slurm-users email archives but could not understand