Package: linux-image-6.12.41+deb13-amd64
Severity: normal

Dear Maintainer,

After the upgrade to Debian Trixie, my MDADM RAID1 and ZFS mirrors
failed reporting one of the two Intel P4510 SSDs in the respective
mirrors to be offline.

~~~
This is an automatically generated mail message.
DegradedArray event detected on md device /dev/md/1
The /proc/mdstat file currently contains the following:

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 nvme1n1p2[0]
      379868160 blocks super 1.2 [2/1] [U_]
      bitmap: 3/3 pages [12KB], 65536KB chunk

[...]
~~~

The NVMe device appears to not be recognized at all, although lspci
still lists the device to be present (this is taken from the “working”
state but the failing state looked similar):

~~~
10000:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter 
SSD [3DNAND, Beta Rock Controller]
10000:02:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter 
SSD [3DNAND, Beta Rock Controller] 
~~~

After upgrading the UEFI firmware of the workstation and the problem
still persisting, I managed to track it down to the difference in Linux
Kernel versions.

Currently, I am back to using the Linux 6.1-series kernel from Bookworm
which correctly recognizes both of the SSDs.

The following dmesg (as recorded by `journalctl -k`) is observed on the
6.12 kernel (mirror fails, only one drive recognized)

~~~
Aug 24 21:55:18: [drm] Found VCN firmware Version ENC: 1.24 DEC: 8 VEP: 0 
Revision: 3
Aug 24 21:55:18: nvme nvme0: pci function 10000:01:00.0
Aug 24 21:55:18: nvme 10000:01:00.0: enabling device (0000 -> 0002)
Aug 24 21:55:18: pcieport 10000:00:02.0: can't derive routing for PCI INT A
Aug 24 21:55:18: nvme 10000:01:00.0: PCI INT A: no GSI
Aug 24 21:55:18: nvme nvme1: pci function 10000:02:00.0
Aug 24 21:55:18: pcieport 10000:00:03.0: can't derive routing for PCI INT A
Aug 24 21:55:18: nvme 10000:02:00.0: PCI INT A: no GSI
Aug 24 21:55:18: sr 6:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram 
cd/rw xa/form2 cdda tray
Aug 24 21:55:18: cdrom: Uniform CD-ROM driver Revision: 3.20
Aug 24 21:55:18: nvme nvme1: D3 entry latency set to 15 seconds
Aug 24 21:55:18: nvme nvme1: 32/0/0 default/read/poll queues
Aug 24 21:55:18: nvme nvme1: Ignoring bogus Namespace Identifiers
Aug 24 21:55:18:  nvme1n1: p1 p2 p3 p4
Aug 24 21:55:18: amdgpu 0000:67:00.0: amdgpu: reserve 0x900000 from 
0x81fd000000 for PSP TMR
~~~

Compare 6.1 (mirror works, both drives recognized)

~~~
Aug 27 18:18:16: sd 1:0:0:0: [sdb] Attached SCSI disk
Aug 27 18:18:16: nvme nvme0: pci function 10000:01:00.0
Aug 27 18:18:16: pcieport 10000:00:02.0: can't derive routing for PCI INT A
Aug 27 18:18:16: nvme 10000:01:00.0: PCI INT A: no GSI
Aug 27 18:18:16: nvme nvme1: pci function 10000:02:00.0
Aug 27 18:18:16: pcieport 10000:00:03.0: can't derive routing for PCI INT A
Aug 27 18:18:16: nvme 10000:02:00.0: PCI INT A: no GSI
Aug 27 18:18:16: nvme nvme1: Shutdown timeout set to 15 seconds
Aug 27 18:18:16: nvme nvme1: 32/0/0 default/read/poll queues
Aug 27 18:18:16: nvme nvme1: Ignoring bogus Namespace Identifiers
Aug 27 18:18:16:  nvme1n1: p1 p2 p3 p4
Aug 27 18:18:16: ixgbe 0000:b3:00.0: Intel(R) 10 Gigabit Network Connection

[...some unrelated hardware messages skipped...]

Aug 27 18:18:16: hid-generic 0003:0738:1703.0008: input,hidraw7: USB HID v1.00 
Mouse [Madcatz Mad Catz R.A.T.3 Mouse] on usb-0000:00:14.0-7.1.4/input0
Aug 27 18:18:16: nvme nvme0: Shutdown timeout set to 15 seconds
Aug 27 18:18:16: nvme nvme0: 32/0/0 default/read/poll queues
Aug 27 18:18:16: nvme nvme0: Ignoring bogus Namespace Identifiers
Aug 27 18:18:16:  nvme0n1: p1 p2 p3 p4
~~~

On 6.12, there is no “second” block where the nvme0n1 would be
recognized.

I currently use the following two out-of-tree modules:

 * ZFS on Linux (zfs-dkms)
 * Merging Ravenna ALSA Linux Kernel Module (third-party module)

I attempted to test with the linux-image-6.16-amd64-unsigned from
experimental but since it would fail to build either modules, I could
not complete that particular test.

After some searching, I also found the following bug report in Ubuntu
(UBUNTU-2111521) which could be the same issue and might contain a
patch: <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521>.

To summarize, I answer the questions from reportbug:

> What led up to the situation?

Upgrade from Debian 12 Bookworm to Debian 13 Trixie

> What exactly did you do (or not do) that was effective (or
> ineffective)?

Upgraded UEFI firmware to no success - same error.
Checked the cabling in the PC - all looks OK.
Booted old Linux kernel - it works for now!

> What was the outcome of this action?

Upgrade to Linux 6.12 no longer recognizes 2nd NVMe SSD.

> What outcome did you expect instead?

Expected drives and mirrors to continue working normally.

Thanks in Advance and Kind Regards,
Linux-Fan

-- System Information:
Debian Release: 13.0
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 
'stable'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.1.0-38-amd64 (SMP w/36 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages linux-image-6.12.41+deb13-amd64 depends on:
ii  initramfs-tools [linux-initramfs-tool]  0.148.3
ii  kmod                                    34.2-2
ii  linux-base                              4.12

Versions of packages linux-image-6.12.41+deb13-amd64 recommends:
ii  apparmor  4.1.0-1

Versions of packages linux-image-6.12.41+deb13-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  extlinux                3:6.04~git20190206.bf6db5b4+dfsg1-3.1
ii  firmware-linux-free     20241210-2
ii  grub-efi-amd64          2.12-9
pn  linux-doc-6.12          <none>

Reply via email to