Package: libhwloc-contrib-plugins
Version: 2.4.1+dfsg-2
Severity: important
Dear Maintainer,
When the libhwloc-contrib-plugins package is installed, running any MPI
program on a Debian 11 host with no GPU produces the following errors:
$ mpirun hostname
CUDA: Failed to get number of devices with cudaGetDeviceCount(): no
CUDA-capable device is detected
NVML: Failed to initialize with nvmlInit(): Driver Not Loaded
CUDA: Failed to get number of devices with cudaGetDeviceCount(): no
CUDA-capable device is detected
NVML: Failed to initialize with nvmlInit(): Driver Not Loaded
dahu-28.grenoble.grid5000.fr
For complex programs, it is quite hard to understand where these messages
come from and what the exact problem is. After investigation, it turns out
that these messages are "warnings" and don't prevent the program from
executing, so they can be ignored. But when the program fails for unrelated
reasons, these messages can mislead the user into thinking the problem is
CUDA-related, while it's actually not.
The expected behaviour is that hwloc should not print warnings about
hardware detection when nothing is actually wrong.
This bug has already been fixed upstream in version 2.5.0rc1:
835dfbe577fcd7 ("core: don't display "less critical" error messages by
default")
https://github.com/open-mpi/hwloc/issues/453
Would it be possible to backport this patch to Debian stable or,
as an alternative, publish hwloc 2.5.0 in bullseye-backports?
Thanks for your time,
Baptiste
-- System Information:
Debian Release: 11.0
APT prefers stable-security
APT policy: (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 5.10.0-8-amd64 (SMP w/64 CPU threads)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL
set to en_US.UTF-8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libhwloc-contrib-plugins depends on:
ii libc6 2.31-13
ii libcudart11.0 5000.0g5k1
ii libhwloc15 2.4.1+dfsg-1
ii libnvidia-ml1 5000.0g5k1
libhwloc-contrib-plugins recommends no packages.
libhwloc-contrib-plugins suggests no packages.
-- no debconf information