Package: libnvme1 Version: 1.3-1 Severity: important Tags: patch Dear Maintainer,
libnvme has a serious bug that, on some NVMe hardware, can trigger DMA writes that overwrite memory of unrelated processes, resulting in random crashes and other system stability issues. This can be caused by simply running `nvme list`. This was very recently fixed upstream in https://github.com/linux-nvme/libnvme/commit/a2b8e52e46cfd888ac5a48d8ce632bd70a5caa93 and https://github.com/linux-nvme/libnvme/commit/68c6ffb11d40a427fc1fd70ac2ac97fd01952913. I've been able to reproduce this in multiple systems that have SKHynix_HFS256GD9TNI-L2B0B SSDs. From recent commit descriptions in libnvme and nvme-cli, it sounds like some NVMe devices DMA only in 4k blocks, but libnvme would sometimes allocate a smaller buffer. Which can result in the DMA operation clobbering unrelated memory. To reproduce: 1. Make sure the kernel isn't using IOMMU (e.g., boot with intel_iommu=off). 2. Run `while nvme list; do sleep 0.1; done`. Generally the nvme process will segfault or abort with an error within a very small number of iterations. Example dmesg output when this happens: [ 2238.591144] show_signal_msg: 6 callbacks suppressed [ 2238.591150] nvme[1315]: segfault at 8 ip 00007fbf286748e9 sp 00007ffe4cbccb30 error 4 in libc.so.6[7fbf28603000+155000] likely on CPU 1 (core 1, socket 0) [ 2238.591178] Code: 24 18 45 85 d2 0f 85 17 05 00 00 48 81 fb ff 03 00 00 76 20 43 8d 44 2d 0c 48 8d 44 c5 00 48 8b 10 48 8d 48 f0 48 39 ca 74 0a <48> 39 5a 08 0f 83 2b 05 00 00 41 8d 4d 01 43 8d 44 2d 0e 89 cf 48 If you keep running this, you'll also find that other processes start crashing as well, usually with segfaults or weird shared library failures. I've seen sshd crash, firefox crash, systemd segfault, etc. As an example, I recently saw sshd failing with this error: Oct 26 19:46:27 challenger sshd[1361]: /usr/sbin/sshd: error while loading shared libraries: /lib/x86_64-linux-gnu/libnsl.so.2: unexpected PLT reloc type 0x00 I was able to trivially apply the two git commits listed above to libnvme 1.3 in Bookworm, and this resolved the crash and memory corruption caused by `nvme list`. I'd recommend applying these changes to libnvme in Bookworm, since the impact is pretty severe for users who happen to own affected devices. There have also been other recent memory alignment changes in libnvme and nvme-cli. It may be worth trying to backport more of these to the Bookworm packages to avoid memory corruption during other nvme operations. -- System Information: Debian Release: 12.2 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 6.1.0-13-amd64 (SMP w/6 CPU threads; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libnvme1 depends on: ii libc6 2.36-9+deb12u3 ii libdbus-1-3 1.14.10-1~deb12u1 ii libjson-c5 0.16-2 ii libssl3 3.0.11-1~deb12u2 libnvme1 recommends no packages. Versions of packages libnvme1 suggests: ii nvme-cli 2.4+really2.3-3 -- no debconf information