Package: linux-image-6.12.1+debian+tj Followup-For: Bug #1076372 X-Debbugs-Cc: tj.iam...@proton.me
I've done some investigative work with the logs and the lspci reports in an attempt to narrow down potential causes and identify relevant clues. 1. Comparison of the Lexar NM790 PCIe configuration between M2 slot 1 and slot 2 does not show any signficant difference: $ diff -b -u "Slot "{1,2}" Lexar NM790.lspci.txt" --- "Slot 1 Lexar NM790.lspci.txt" 2024-12-18 13:59:02.675502501 +0000 +++ "Slot 2 Lexar NM790.lspci.txt" 2024-12-18 14:08:42.770250485 +0000 @@ -1,12 +1,11 @@ -02:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 NVME SSD (DRAM-less) (rev 01) (prog-if 02 [NVM Express]) +03:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 NVME SSD (DRAM-less) (rev 01) (prog-if 02 [NVM Express]) Subsystem: Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 NVME SSD (DRAM-less) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes - Interrupt: pin A routed to IRQ 41 - NUMA node: 0 - IOMMU group: 14 - Region 0: Memory at f6e00000 (64-bit, non-prefetchable) [size=16K] + Interrupt: pin A routed to IRQ 39 + IOMMU group: 16 + Region 0: Memory at f6c00000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- 2. The common factor seems to be M2 Slot 1 that, as I noted previously, is specified as Gen5x4 whereas slot 2 is Gen4x4. 3. Corrupted blocksize is reported as likely to be 128KiB. I examined all 332 commits between v6.3.5..v6.3.7 and one in particular stands out due to mentioning clamping NVME sysfs queue max_hw_sectors to this value: Refs: v6.3-rc2-139-g3710e2b056cb9 Commit: 3710e2b056cb92ad816e4d79fa54a6a5b6ad8cbd Title: nvme-pci: clamp max_hw_sectors based on DMA optimized limitation There are recent follow-up discussions related to this commit: "nvme-sysfs: display max_hw_sectors_kb without requiring namespaces" https://lkml.org/lkml/2024/10/16/1721 The "DMA optimized limitation" refers to Refs: v4.7-rc1~31^2~1 Commit: 9257b4a206fc0229dd5f84b78e4d1ebf3f91d270 Title: iommu/iova: introduce per-cpu caching to iova allocation Where this is related to I/O Virtual Address space range caching and says: "To keep the cache size reasonable, we bound the IOVA space a CPU can cache by 32 MiB (we cache a bounded number of IOVA ranges, and only ranges of size <= 128 KiB). The shared global cache is bounded at 4 MiB of IOVA space." The nvmi-pci commit mentions the NVME MDTS value (Maximum Data Transfer Size) that is the power-of-2 multiplier of the MPSMIN (Memory Page Size Minimum). It is worth checking what the two devices support. It could be that combined with the Gen5x4 slot 1 of this motherboard there is some incompatiblity for certain sizes here. Example: $ sudo /usr/sbin/nvme show-regs -H /dev/nvme0 | grep -a MPSMIN get-property: Invalid argument Memory Page Size Minimum (MPSMIN): 4096 bytes $ sudo /usr/sbin/nvme id-ctrl /dev/nvme0 | grep -a mdts mdts : 5 $ echo $(( 2 ** 5 )) 32 $ echo $(( 2 ** 5 * 4096 )) 131072 So here the device uses 128KiB.