Package: linux-image-6.12.1+debian+tj
Followup-For: Bug #1076372
X-Debbugs-Cc: tj.iam...@proton.me

I've done some investigative work with the logs and the lspci reports in
an attempt to narrow down potential causes and identify relevant clues.

1. Comparison of the Lexar NM790 PCIe configuration between M2 slot 1 and
slot 2 does not show any signficant difference:

$ diff -b -u "Slot "{1,2}" Lexar NM790.lspci.txt"
--- "Slot 1 Lexar NM790.lspci.txt"      2024-12-18 13:59:02.675502501 +0000
+++ "Slot 2 Lexar NM790.lspci.txt"      2024-12-18 14:08:42.770250485 +0000
@@ -1,12 +1,11 @@
-02:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. 
Lexar NM790 NVME SSD (DRAM-less) (rev 01) (prog-if 02 [NVM Express])
+03:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. 
Lexar NM790 NVME SSD (DRAM-less) (rev 01) (prog-if 02 [NVM Express])
Subsystem: Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 NVME SSD 
(DRAM-less)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
-    Interrupt: pin A routed to IRQ 41
-    NUMA node: 0
-    IOMMU group: 14
-    Region 0: Memory at f6e00000 (64-bit, non-prefetchable) [size=16K]
+       Interrupt: pin A routed to IRQ 39
+       IOMMU group: 16
+       Region 0: Memory at f6c00000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

2. The common factor seems to be M2 Slot 1 that, as I noted previously,
is specified as Gen5x4 whereas slot 2 is Gen4x4. 

3. Corrupted blocksize is reported as likely to be 128KiB. I examined
all 332 commits between v6.3.5..v6.3.7 and one in particular stands out
due to mentioning clamping NVME sysfs queue max_hw_sectors to this value:

Refs: v6.3-rc2-139-g3710e2b056cb9
Commit: 3710e2b056cb92ad816e4d79fa54a6a5b6ad8cbd
Title: nvme-pci: clamp max_hw_sectors based on DMA optimized limitation

There are recent follow-up discussions related to this commit:

"nvme-sysfs: display max_hw_sectors_kb without requiring namespaces"

https://lkml.org/lkml/2024/10/16/1721

The "DMA optimized limitation" refers to

Refs: v4.7-rc1~31^2~1
Commit: 9257b4a206fc0229dd5f84b78e4d1ebf3f91d270
Title: iommu/iova: introduce per-cpu caching to iova allocation

Where this is related to I/O Virtual Address space range caching and
says:
"To keep the cache size reasonable, we bound the IOVA space a CPU can
cache by 32 MiB (we cache a bounded number of IOVA ranges, and only
ranges of size <= 128 KiB).  The shared global cache is bounded at
4 MiB of IOVA space."

The nvmi-pci commit mentions the NVME MDTS value (Maximum Data Transfer Size)
that is the power-of-2 multiplier of the MPSMIN (Memory Page Size
Minimum).

It is worth checking what the two devices support. It could be that
combined with the Gen5x4 slot 1 of this motherboard there is some incompatiblity
for certain sizes here.

Example:

$ sudo /usr/sbin/nvme show-regs -H /dev/nvme0 | grep -a MPSMIN
get-property: Invalid argument
Memory Page Size Minimum         (MPSMIN): 4096 bytes
$ sudo /usr/sbin/nvme id-ctrl /dev/nvme0 | grep -a mdts
mdts      : 5
$ echo $(( 2 ** 5 ))
32
$ echo $(( 2 ** 5 * 4096 ))
131072

So here the device uses 128KiB.

Reply via email to