Hi,

I'm trying to compile Slurm with NVIDIA NVML support, but the result is unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would expect).

~$ ldd  /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so
        linux-vdso.so.1 (0x00007ffd9a3f4000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0bc2c06000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0bc2e47000)

/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is present during compilation. Also I can see that NVML headers where found in config.status (else I wouldn't get gpu_nvml.so at all to my understanding).

Our old cluster was deployed with NVIDIA deepops (which compiles Slurm on every node) and also has NVML support. There ldd brings the expected result

~$ ldd /usr/local/lib/slurm/gpu_nvml.so
...
libnvidia-ml.so.1 => /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (0x00007f3b10120000)
...

I can't test actual functionality with my new binaries because I don't have a node with GPUs yet.

Am I missing something?

thank you
Matthias


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to