Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when
I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no
reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would
expect).
~$ ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so
linux-vdso.so.1 (0x00007ffd9a3f4000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0bc2c06000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0bc2e47000)
/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is present during compilation.
Also I can see that NVML headers where found in config.status (else I
wouldn't get gpu_nvml.so at all to my understanding).
Our old cluster was deployed with NVIDIA deepops (which compiles Slurm
on every node) and also has NVML support. There ldd brings the expected
result
~$ ldd /usr/local/lib/slurm/gpu_nvml.so
...
libnvidia-ml.so.1 => /lib/x86_64-linux-gnu/libnvidia-ml.so.1
(0x00007f3b10120000)
...
I can't test actual functionality with my new binaries because I don't
have a node with GPUs yet.
Am I missing something?
thank you
Matthias
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com