Hi Zhang
Thanks for the quick reply.
Could you please guide me on specifying MIG partitions in gres.conf and in
slurm.conf
My MIG is as below:
root@rl-dgxs-r21-l2:~# sudo nvidia-smi mig -lgi
++
| GPU instances:
Hi Ravi
Unfortunately if the NVML flag is off on compile time ( when the maintainer
build the apt package for you to install ), that part of code would not be in
your binary code.
Recompile yourself following the official documentation or find some repository
that builds slurm with NVML are
Hi Josef and Rob
Thanks for the reply.
I do agree cuda-nvml-devel was not there while installing slurm-llnl in Ubuntu
22.04.
Later I installed it.
I did not build slurm but I installed it from apt install slurm command.
Is there any method to use it post slurm installation?
With Warm Regards
Hi all,
Apologies for writing something misleading in the last mail. I missed your
error message.
Rob was correct - your slurmd appears not to have the NVML flag on compile
time.
You need to set up the NVML and turn the --with-nvml flag on when
configuring slurm to fix the issue if you are compil
Hi all,
If you could offer a little bit more details on your OS and Slurm version
that might shed some light.
There is an interesting detail about the NVML package if you are using
RHEL-like OS.
The NVML detection part of the slurm library (/usr/lib64/slurm/gpu_nvml.so)
is linked against the /lib
Did you have --with-nvml as part of your configuration? Go back to your
config.log and verify that it ever said it found nvml.h.
If not, then you'll need to make sure you have the right nvidia/cuda packages
installed on the host you're building slurm on, and you might have to specify
--with-nv
couldn't be that library "cuda-nvml-devel" was not installed when you
were building slurm?
cheers
josef
On 30. 11. 23 15:06, Ravi Konila wrote:
Hello,
My gres.conf has AutoDetect=nvml
when I restart slurmd service I do get
*fatal: We were configured to autodetect nvml functionality, but we
w
Hello,
My gres.conf has AutoDetect=nvml
when I restart slurmd service I do get
fatal: We were configured to autodetect nvml functionality, but we weren't able
to find that lib when Slurm was configured.
Referred few links to solve along with slurm-users email archives but could not
understand