Hi, first, answers to your questions / remarks / reminders / notes:
1. I installed the latest mainboard firmware (/ UEFI / BIOS). 2. Both tested NVMe have a capacity of 4TB. The qualifying storage list does not contain any 4TB SSD from Lexar or Kingston 3. The corruptions occurs locally. But maybe only if large amounts of data are transferred (also see 4.). The easiest way to reproduce the bug is to use the tool `f3` which writes + verifies files with pseudo-random pattern (intended to detect faked memory cards) 4. Strangely enough, I probably generated thousands of corrupted files, but never noticed any file system errors. That's why my first idea was that is is an filesystem issue. 5. Lexar SSD with > 6.1 kernels in primary M.2 Socket (front side) produces write errors; Kingston SSD with 6.1 kernel in primary M.2 socket produces read errors 6. The errors (read+write) occur bulk-wise, i.e. if the 1GB files (read + written by f3) are either o.k. or larger portions are defect. At least if write errors occur, the portions are often (but not always) multiples of 128KB. 7. The Asrock X600M(-STX) is chipset-less, i.e. the CPU AMD 8700G runs in SOC mode. Conclusion: A. I disagree that it is a SSD-specific issue. For example, the older Lexar SSD ran in the previous PC without any issues and works in the secondary (rear) M.2 with both tested kernel. On the other hand, both tested SSD's in the primary socket produces issues some kernels. B. I think that the bug is either a CPU- (see 7.) or mainboard specific and because there are issues with both tested SSD, someone did a bad job testing the hardware. C. Because some symptoms are quite weird (4., 5.) it may be something unusual, like a module writing into momory that belong to another module. Testing: ATM and also within the next months, testing is difficult, because the PC is installed remotely and in full use. (For testing I need to swap the SSD's ...) But I'll try to test the latest mainline (LTS and stable) in November in order to verify that it is no Debian issue. (At least the LTS kernel can be tested remotely.) Regards Stefan Am 30.10.24 um 22:42 schrieb Tj:
Package: linux-image-6.11.5+debian+tj Followup-For: Bug #1076372 X-Debbugs-Cc: tj.iam...@proton.me Following up from the kernel team discussion this evening that I only caught the tail-end of I've reviewed this report and have the following suggestions and observations. It would be good to see complete-from-boot kernel logs for good/bad results linked to which M2 slot each device is in. That mobo (AsRock X600M-STX) has 3 M.2, 2 for SSDs (+1 for WiFi), M2_1 Gen5x4 on front of PCB, and M2_2 Gen4x4 on rear. AsRock also publish a qualifying storage device list and the Lexar LNM790 appears to be on it https://www.asrock.com/Nettop/AMD/DeskMini%20X600%20Series/index.asp#Storage (and presumably implicitly requires most recent UEFI - there have been several very recent updates). https://www.asrock.com/Nettop/AMD/DeskMini%20X600%20Series/index.asp#BIOS Also of use would be to know if the corruption occurs for locally generated data - report states data is received from network so with current knowledge the issue could be on the network side. Also - with my forensics hat on - being shown the data expected vs corrupted might give clues as to what type of cause it is. For example, I've dealt with situations where a single register bit would flip occasionally and the data stream would be scrambled until it flipped once again and the data is unscrambled, or it could be bits/words being lost entirely so when stored remaining data is at a different offset to the original, but is still there.