FYI I have captured the `sudo lspci -vv` output on the kernel 5.8
*before* the issue here https://pastebin.ubuntu.com/p/GtZyTWzKTd/ it is
subtly different to the 5.4 kernel (which has not had the issue) in case
that mattered.

I was also able to reproduce the issue again by causing high disk I/O,
specifically I needed to have writes occurring for it to happen (I was
recursive grep'ing the whole filesystem while installing apt/pip
packages inside a docker container).

This then froze the system for 120 seconds until write timeouts
occurred, then the disk was remounted as read-only. After this point
commands on the system would fail with I/O errors (even basic ones such
as "top", although some such as "mount" still work).

However our plan was to try to retrieve more information by copying the
lspci binary and libs into a tmpfs system in RAM, so it'd still be
accessible when the disk stopped. This almost worked, but it appears a
few more configuration files would need to be placed in RAM (I could run
"lspci --help" but not "lspci" or "lspci -vv"). Instead popey has
suggested maybe using a USB key with debootstrap/chroot. (Any
suggestions of how we can retrieve more information at this point are
welcome and any commands that would be useful to run).

Also as a note, if I use REISUB (
https://en.m.wikipedia.org/wiki/Magic_SysRq_key#Uses ) to reboot the
machine it enters a Dell BIOS/recovery thing that states that "No Hard
Disk is found". Then after a full power off the machine works again.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866

Title:
  nvme drive fails after some time

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to