Package: libhwloc-contrib-plugins
Version: 2.4.1+dfsg-2
Severity: important

Dear Maintainer,

When the libhwloc-contrib-plugins package is installed, running any MPI
program on a Debian 11 host with no GPU produces the following errors:

    $ mpirun hostname
    CUDA: Failed to get number of devices with cudaGetDeviceCount(): no 
CUDA-capable device is detected
    NVML: Failed to initialize with nvmlInit(): Driver Not Loaded
    CUDA: Failed to get number of devices with cudaGetDeviceCount(): no 
CUDA-capable device is detected
    NVML: Failed to initialize with nvmlInit(): Driver Not Loaded
    dahu-28.grenoble.grid5000.fr

For complex programs, it is quite hard to understand where these messages
come from and what the exact problem is.  After investigation, it turns out
that these messages are "warnings" and don't prevent the program from
executing, so they can be ignored.  But when the program fails for unrelated
reasons, these messages can mislead the user into thinking the problem is
CUDA-related, while it's actually not.

The expected behaviour is that hwloc should not print warnings about
hardware detection when nothing is actually wrong.

This bug has already been fixed upstream in version 2.5.0rc1:

    835dfbe577fcd7 ("core: don't display "less critical" error messages by 
default")
    https://github.com/open-mpi/hwloc/issues/453

Would it be possible to backport this patch to Debian stable or,
as an alternative, publish hwloc 2.5.0 in bullseye-backports?

Thanks for your time,
Baptiste

-- System Information:
Debian Release: 11.0
  APT prefers stable-security
  APT policy: (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-8-amd64 (SMP w/64 CPU threads)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL 
set to en_US.UTF-8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libhwloc-contrib-plugins depends on:
ii  libc6          2.31-13
ii  libcudart11.0  5000.0g5k1
ii  libhwloc15     2.4.1+dfsg-1
ii  libnvidia-ml1  5000.0g5k1

libhwloc-contrib-plugins recommends no packages.

libhwloc-contrib-plugins suggests no packages.

-- no debconf information

Reply via email to