I also have this on two systems, both running Ubuntu 24.04.1.
First system:
ASUS Pro WS TRX50-SAGE WIFI + AMD Threadripper 7970X
Ubuntu 24.04 Desktop + kernel 6.11.x from Xanmod
SSDs that have this issue: Solidigm P44 Pro 2TB and Samsung 990 Pro 4TB
Second system:
Gigabyte MZ32-AR0 (rev 1.0) + AM
** Attachment added: "Issue on system booted with
`nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off`"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+attachment/5836074/+files/homelab-hang-dmesg-no-aspm.log
--
You received this bug notification because you a
I have upgraded from 22.04 to 24.04, and I have removed the GRUB
configuration "nvme_core.default_ps_max_latency_us=9000".
Let's see if the problem is fixed in this Kernel. I will report back.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 24.04.1 LTS
I am so sorry I haven't updated on if the setting change worked for my
issue.
I promptly forgot the issue existed at all because it never happened
again even with a months uptime.
Only today when upgrading my system from 23.10 to 24.04 was I reminded
of the existence of the change in the grub con
We have an ubuntu server running a set of eight Samsung 980 Pro PCIe 4.0
NVMe SSDs (model MZ-V8P1T0BW) on Ubuntu 20.04.3 LTS (GNU/Linux
5.4.0-88-generic x86_64). We've seen this happen at least 5 times over
the past month, and not always on the same SSD. We first saw it happen
on 5.4.0-81. Some sam
I'm seeing this in focal kernel 5.4.0-88. Is this expected? Do I have to
switch to the hwe kernel pointed above to fix this?
The laptop has been stable for a long time and then suddenly started
having this exact symptom a few days ago. I'm wondering if this was
introduced in latest ga kernels for
** Also affects: debian
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Title:
nvme drive fails after some time
To manage notifications about this
This bug was fixed in the package linux - 5.8.0-41.46
---
linux (5.8.0-41.46) groovy; urgency=medium
* groovy/linux: 5.8.0-41.46 -proposed tracker (LP: #1912219)
* Groovy update: upstream stable patchset 2020-12-17 (LP: #1908555) // nvme
drive fails after some time (LP: #1910
@Andrew, thank you for testing! I'm switching verification status to
'verification-done-groovy'.
** Tags removed: verification-needed-groovy
** Tags added: verification-done-groovy
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https
@Kleber I have installed the focal hwe kernel from proposed (as seen
below). So far when A/B testing this kernel it is working correctly :-)
I will continue running this kernel and report any issues I have.
Also note that I have been continuously running the test kernel (from
comment 22) since las
Hello Alan or anyone else affected,
The fix for this bug is also available on the hwe kernel for Focal
currently in -proposed (version 5.8.0-41.46~20.04.1). Feedback whether
this kernel fixes the nvme issue would be appreciated.
Thank you.
--
You received this bug notification because you are a
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
groovy' to 'verification-done-groovy'. If the problem still exists,
change the tag 'verifica
Thank you Andrew for your feedback!
We have applied the fix for groovy/linux (and focal/linux-hwe-5.8) and
the new kernels will be available in -proposed soon. These packages are
planned to be promoted to -updates early next week.
--
You received this bug notification because you are a member of
** Changed in: linux (Ubuntu Groovy)
Status: In Progress => Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Title:
nvme drive fails after some time
To manage notification
** Also affects: linux (Ubuntu Groovy)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Groovy)
Status: New => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/
@Marcelo So far it looks good :-) It passes the "fio" command test when
A/B testing between a known bad kernel and this new kernel. I will
continue running it on this machine over the weekend to ensure longer
usage doesn't have any remaining issues - but looks like it resolves the
issue so far :-D
Thanks! I'll take a look :-)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Title:
nvme drive fails after some time
To manage notifications about this bug go to:
https://bugs.launchpad.net/u
Hi, Andrew.
I created a test kernel with the fix and it is available at:
https://kernel.ubuntu.com/~mhcerri/lp1910866_linux-5.8.0-38-generic_5.8.0-38.43+lp1910866_amd64.tar.gz
You can install it by extracting the tarball and installing the debian
packages:
$ tar xf lp1910866_linux-5.8.0-38-gene
Andrew, we plan to address this in the Focal 5.8 hwe kernel and we're
going to be building a test kernel. We would really appreciate you
testing it since you have a reliable reproducer.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
@kaihengfeng Thanks for the quick response! bug 1908555 linked there
only lists groovy as a target series, I hope that this will also be
applied to the focal HWE kernel :-)
Also I am happy to test any kernel in a -proposed channel or PPA to
confirm it fixes the issue if that helps :-)
--
You re
OK, the fix will be in next 5.8 update:
commit f62ddacc4cb141b86ed647f9dd9eeb7653b0cc43
Author: Keith Busch
Date: Fri Oct 30 10:28:54 2020 -0700
Revert "nvme-pci: remove last_sq_tail"
BugLink: https://bugs.launchpad.net/bugs/1908555
[ Upstream commit 38210800bf66d7302da1bb
@kaihengfeng
So v5.7 was fine and after many reboots it has been found that this
commit below introduced the issue.
Do I also need to find when the issue was resolved ? (between v5.8-rc1
and v5.9.10) or is this information enough ?
54b2fcee1db041a83b52b51752dade6090cf952f is the first bad commi
Thanks a lot!
Can you please test v5.7? Stable release (point release) isn't linear with
mainline kernel.
Once you are sure v5.7 is good, we can start a bisect:
$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$
And the bisect between 5.4.78 (good) and 5.8.18 (bad).
The following results with the mainline kernel
v5.8.18/FAIL
v5.8.4/ FAIL
v5.8-rc5/ FAIL
v5.8-rc1/ FAIL
v5.7.19/PASS
v5.7.18/PASS
v5.7.16/
So bisecting between 5.8.18 (bad) and 5.11-rc3 (good).
The following results with the mainline kernel
v5.11-rc3/ PASS
v5.9.12/PASS
v5.9.10/PASS
v5.9.9/ MISSING
v5.9.8/ FAIL (could not boot long enough for full test)
v5.9.
OK, so using https://people.canonical.com/~kernel/info/kernel-version-
map.html that states that Ubuntu kernel 5.8.0-36.40~20.04.1 matches
mainline version 5.8.18. I have installed 5.8.18 and it fails ! So it is
not the Ubuntu patches.
Ubuntu Kernels:
linux-image-5.4.0-59-generic: PASS
linux-image
@kaihengfeng
I have found that running the command "fio --name=basic
--directory=/path/to/empty/directory --size=1G --rw=randrw --numjobs=4
--loops=5" runs fine on linux-image-5.4.0-59-generic but when trying
with linux-image-5.8.0-36-generic it would freeze the system in the
"Laying out IO file"
Andrew, since you can reliably reproduce the issue, can you please test latest
mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11-rc3/amd64/
And we'll do a bisect or reverse-bisect based on the result.
--
You received this bug notification because you are a member of Ubuntu
B
FYI I have captured the `sudo lspci -vv` output on the kernel 5.8
*before* the issue here https://pastebin.ubuntu.com/p/GtZyTWzKTd/ it is
subtly different to the 5.4 kernel (which has not had the issue) in case
that mattered.
I was also able to reproduce the issue again by causing high disk I/O,
s
I've tried doing various IO intensive things to trigger it but no luck
yet.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Title:
nvme drive fails after some time
To manage notifications abo
Note for me it is happening quite rapidly (sometimes after 5-10 minutes)
of high disk load. Eg the first times it happened when apt was running
update-grub and then when pip3 install was running. Then to capture the
logs above i started a `find /` and `find ~` at the same time and this
was enough t
I can try, but I can't trigger it to happen. Given I had 60 days uptime
on my system before it happened last time, and 12 days the time before
that. That gives you some idea of the interval between it happening.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which
@kairhengfeng Yes this is a regression after the upgrade from 5.4 to
5.8. After the upgrade I had it multiple times and now I have switched
back to 5.4 my machine is stable again.
I do not think I can run `lspci -vv` *after* the issue happens, as my
NVMe drive goes read-only, so all commands fail
Is this a regresison? Did it start to happen after upgrade from 5.4 to
5.8?
And is it possible to attach `lspci -vv` after the issue happen?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Titl
It's the TOSHIBA-RD400 on /home for me that's failing.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Title:
nvme drive fails after some time
To manage notifications about this bug go to:
ht
I'm on Ubuntu 20.04, and after updating to the HWE 5.8 kernel recently I
have also been suffering my nvme drive becoming read only after a period
of time. I have now switched back to the 5.4 kernel and not suffered the
issue again.
I am on a single disk system so had to run dmesg --follow remotely
Which one is the failing one? Samsung or OCZ?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1910866
Title:
nvme drive fails after some time
To manage notifications about this bug go to:
https://bug
37 matches
Mail list logo