On 05/10/2022 01:00, David Christensen wrote:
I have moved the majority of my data to servers with ECC memory and ZFS mirrors, but I have little to no defense against memory errors on my desktops and laptops without ECC memory. So, I keep as little data as possible on the latter, and backup/ archive daily or sooner.
Fortunately, we are blessed with AMD Ryzen processors having unlocked ECC functionality, nothing stupid like fully supported ECC on Intel processors but disabled because no "Xeon" in the name. I use this in my desktop PC: CPU: 8-core model: AMD Ryzen 7 5800X Mobo: ASUSTeK model: PRIME B550-PLUS Memory: RAM: total: 62.72 GiB used: 19.94 GiB (31.8%) Array-1: capacity: 128 GiB slots: 4 EC: Multi-bit ECC Device-1: DIMM_A1 type: no module installed Device-2: DIMM_A2 type: DDR4 size: 32 GiB speed: 3600 MT/s Device-3: DIMM_B1 type: no module installed Device-4: DIMM_B2 type: DDR4 size: 32 GiB speed: 3600 MT/s Works amazing. No single bit loss (due to memory) since I built it. But I had files corrupted due to SSD though. Btrfs detects all checksum errors so I know right away what is happening. No long term bit rot and important data lost, like I would have been on other filesystems and non-ECC memory. And my home server is similar, older build: CPU: Info: 8-Core model: AMD Ryzen 7 1700 Mobo: ASUSTeK model: PRIME B350-PLUS Memory: RAM: total: 62.81 GiB used: 12.13 GiB (19.3%) Array-1: capacity: 128 GiB slots: 4 EC: Multi-bit ECC Device-1: DIMM_A1 size: 16 GiB speed: 2666 MT/s Device-2: DIMM_A2 size: 16 GiB speed: 2666 MT/s Device-3: DIMM_B1 size: 16 GiB speed: 2666 MT/s Device-4: DIMM_B2 size: 16 GiB speed: 2666 MT/s Had at least two critical file corruption losses before I went all ECC, now, for two years its been perfectly stable.
Do you have a URL that documents bugs in memtest86+?
In Debian? Not that I am aware of. But throughout my last two years of refurbishing a bit older machines, about 50 various desktops and laptops, often times memtest86+ would just hang (no progress moving but small ASCII characters is moving around like it's doing something). That happens especially when maximum (for old PC BIOS) RAM is installed. Windows, and Linux work fine with that maximized memory amount of, example: 4 GB (for older machines) or 8 GB (for a bit newer ones) etc etc. Windows10 memory tester thing would even test it fine. And Linux "memtester" which runs while system is running, also perfect result no errors. But memtest86+ would just hang, always at exactly the same percentage in first test, and computer would need to be reset. It happens on memtest86+ from Linux Mint ISO too, so perhaps this is for software authors, not for Debian BTS. Rare error, on old exotic builds, but annoying. That's why where I can (on UEFI) I prefer to use proprietary Passmark's memtest86. -- With kindest regards, Piotr. ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org/ ⠈⠳⣄⠀⠀⠀⠀