On Mon, 18 Nov 2024 09:58:03 +0100 Uwe Kleine-König wrote:

[...]
> On Wed, Nov 13, 2024 at 11:15:03PM +0100, Francesco Poli wrote:
> > On Mon, 11 Nov 2024 11:22:26 +0100 Uwe Kleine-König wrote:
[...]
> > > I guess the kernel provides a directory "/sys/class/infiniband_mad". Do
> > > its contents look different on 6.10.x and 6.11.x?
> > 
> > I will look into this as soon as I can reboot the cluster head node.

I looked into this, while testing the new Debian Linux kernel that has just 
migrated to testing (which, once again, makes opensm fail to start, just like 
other 6.11.x versions).

With a working kernel:

  $ uname -v
  #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22)
  $ ls -altrF /sys/class/infiniband_mad/
  total 0
  lrwxrwxrwx  1 root root    0 Nov  4 15:58 umad0 -> 
../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
  lrwxrwxrwx  1 root root    0 Nov  4 15:58 umad1 -> 
../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
  lrwxrwxrwx  1 root root    0 Nov 11 15:54 issm1 -> 
../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/
  lrwxrwxrwx  1 root root    0 Nov 11 15:54 issm0 -> 
../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/
  drwxr-xr-x  2 root root    0 Nov 11 15:54 ./
  drwxr-xr-x 72 root root    0 Nov 11 15:54 ../
  -r--r--r--  1 root root 4096 Nov 11 15:54 abi_version
  $ cat /sys/class/infiniband_mad/abi_version 
  5

With a kernel that makes opensm fail to start:

  $ uname -v
  #1 SMP PREEMPT_DYNAMIC Debian 6.11.7-1 (2024-11-09)
  $ ls -altrF /sys/class/infiniband_mad/
  total 0
  drwxr-xr-x 73 root root    0 Nov 18 09:41 ../
  -r--r--r--  1 root root 4096 Nov 18 09:41 abi_version
  lrwxrwxrwx  1 root root    0 Nov 18 09:41 umad0 -> 
../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
  lrwxrwxrwx  1 root root    0 Nov 18 09:41 umad1 -> 
../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
  drwxr-xr-x  2 root root    0 Nov 18 09:43 ./
  $ cat /sys/class/infiniband_mad/abi_version
  5

As you can see, a couple of files (symlinks) are missing here...

Does this ring a bell?
Can you tell what's wrong, by just looking at this?
Or, at least, do you get some less vague idea of what's going on?

[...]
> > Before I go on and try to install the resulting Debian package, could
> > you please review the transcript of what I did (see the attached file)?
> 
> Looks good. Probably the individual answers don't matter much and the
> default should be fine. Just continue with my instructions and if the
> resulting kernels boots and behave as the respective versions packaged
> by Debian, everything is fine. Iff that fails, a more detailed review is
> needed.

Thanks for confirming, I really hope I can find a time window, where I
can bisect...


-- 
 http://www.inventati.org/frx/
 There's not a second to spare! To the laboratory!
..................................................... Francesco Poli .
 GnuPG key fpr == CA01 1147 9CD2 EFDF FB82  3925 3E1C 27E1 1F69 BFFE

Attachment: pgpsbXT9Sf2mi.pgp
Description: PGP signature

Reply via email to