On Mon, 2020-12-14 at 20:19 +1100, Eyal Lebedinsky wrote: > > > On 14/12/2020 13.20, Chris Murphy wrote: > > On Sun, Dec 13, 2020 at 4:42 PM Eyal Lebedinsky > > <fed...@eyal.emu.id.au> wrote: > > > > > > I am not sure which list this should go to, so I am starting > > > here. > > > > > > I run f32 fully updated > > > 5.9.13-100.fc32.x86_64 > > > on relatively new hardware > > > kernel: DMI: Gigabyte Technology Co., Ltd. Z390 UD/Z390 > > > UD, BIOS F8 05/24/2019 > > > > > boot/root/swap/data is on nvme > > > WD Blue SN550 1TB M.2 2280 NVMe SSD WDS100T2B0C > > > > I can't tell from WD's website if there's any newer firmware > > available. They seem to hide this information within the Windows- > > only > > software "Western Digital Dashboard". If you have Windows already > > installed, it's straightforward to install this and find out if the > > firmware is up to date. > > Option 1) My nvme disk is on the mobo which has only one slot. I have > access to a windows laptop > but I will also need an external nvme/USB adapter - will the > Dashboard work this way? > Will a fw update leave the disk content safe? > > I will try something else first. > > > There is a boot parameter 'nvme_core.default_ps_max_latency_us' > > which > > takes a value in usec, but I can't find a value specific to this > > make/model NVMe. My gut instinct is, it's a hack put in by upstream > > kernel developers to work around a proper autodetect solution > > between > > PCIe intereface and the drive. I would sooner return the drive and > > get > > one known to work. I can vouch for Crucial, Seagate, and Samsung > > SSD > > and NVMe for the most part. > > Option 2) Reading the reports (and more) I decided to test the boot > param > nvme_core.default_ps_max_latency_us=0 > which I understand turns off the APSTE feature. > > > Oh here's a bug report > > https://bugzilla.redhat.com/show_bug.cgi?id=1844905 > > > > That leads here: > > https://bugzilla.kernel.org/show_bug.cgi?id=208123 > > > > comment 1 is a more solid lead than comment 2, because comment 2 is > > a > > value that is based on what? A guess? Reading the rest of the > > thread, > > it's still uncertain. > > > > > > > For the second time this disk stopped working (first was about > > > two months ago). > > > It seems that the disk failed hard and could not be reset, the > > > machine was powered off/on. > > > I think (not sure) that last time I just hit the reset button but > > > it did not boot. > > > > > > The machine was booted (after dnf update) around 8pm, and crashed > > > at 4am. > > > > > > Following the earlier crash a serial console was set up which is > > > how I can see the failure messages. > > > > > > == nvme related messages > > > [ 7.488638] nvme nvme0: pci function 0000:06:00.0 > > > [ 7.536593] nvme nvme0: allocated 32 MiB host memory buffer. > > > [ 7.541819] nvme nvme0: 8/0/0 default/read/poll queues > > > [ 7.558122] nvme0n1: p1 p2 p3 p4 > > > [ 19.590010] EXT4-fs (nvme0n1p3): mounted filesystem with > > > ordered data mode. Opts: (null) > > > [ 20.653500] Adding 16777212k swap on /dev/nvme0n1p2. > > > Priority:-2 extents:1 across:16777212k SSFS > > > [ 20.820539] EXT4-fs (nvme0n1p3): re-mounted. Opts: (null) > > > [ 23.137206] EXT4-fs (nvme0n1p1): mounted filesystem with > > > ordered data mode. Opts: (null) > > > [ 23.210717] EXT4-fs (nvme0n1p4): mounted filesystem with > > > ordered data mode. Opts: (null) > > > ## nothing unusual for 8 hours, then > > > [28972.459036] nvme nvme0: I/O 840 QID 6 timeout, aborting > > > [28972.464757] nvme nvme0: I/O 565 QID 7 timeout, aborting > > > [28972.470277] nvme nvme0: I/O 566 QID 7 timeout, aborting > > > [28973.291025] nvme nvme0: I/O 989 QID 1 timeout, aborting > > > [28978.603061] nvme nvme0: I/O 990 QID 1 timeout, aborting > > > [29002.667243] nvme nvme0: I/O 840 QID 6 timeout, reset > > > controller > > > [29032.875421] nvme nvme0: I/O 24 QID 0 timeout, reset controller > > > [29074.097644] nvme nvme0: Device not ready; aborting reset, > > > CSTS=0x1 > > > [29074.110354] nvme nvme0: Abort status: 0x371 > > > [29074.114953] nvme nvme0: Abort status: 0x371 > > > [29074.119523] nvme nvme0: Abort status: 0x371 > > > [29074.124114] nvme nvme0: Abort status: 0x371 > > > [29074.128710] nvme nvme0: Abort status: 0x371 > > > [29096.645478] nvme nvme0: Device not ready; aborting reset, > > > CSTS=0x1 > > > [29096.652210] nvme nvme0: Removing after probe failure status: - > > > 19 > > > [29119.165921] nvme nvme0: Device not ready; aborting reset, > > > CSTS=0x1 > > > ## many I/O errors on nvme0 (p2/p3/p4) repeating until a reboot > > > at 8:30am > > > ## one different message, appearing just once: > > > [29123.800844] nvme nvme0: failed to set APST feature (-19) > > > > I'd take the position that it's defective and permit the > > manufacturer > > a short leash to convince me otherwise via a tech support call or > > email. But I really wouldn't just wait around for another 2 months > > not > > knowing if it's going to fail again. I'd like some kind of answer > > for > > this problem from support folks. And if they can't give support, > > get > > rid of it. > > > > The time frame for a repeat of the problem is why I'm taking this > > slightly different view, than the tinker with firmware view > > earlier. > > It's not horrible to update firmware, and give it a go, if this > > problem happens once a week or more often. But every two months? > > Forget it. Make it their problem. > > > > And seriously I give them one chance. If they b.s. me and it flakes > > out again in a month or two, no more chances. So the quandary is, > > what's your return policy window? If it's about to end, just return > > it > > now. It should just work out of the box. WDC does contribute to the > > kernel. Whether this is a product supported on Linux I don't know. > > Option 3) Get a new disk from a reliable brand (as mentioned on this > thread) > and keep this one as a spare. I will do this if the problem happens > again.. > > I will log an issue with WD and see what they have to say. > > Thanks everyone > > -- > Eyal Lebedinsky (fed...@eyal.emu.id.au
You mentioned only a single NVME slot on your motherboard. If you have an available PCIe slot, there's a nifty adapter you can buy for a second NVME drive: https://www.amazon.com/GLOTRENDS-Adapter-Aluminum-Heatsink-PA09_HS/dp/B07FN3YZ8P/ref=sr_1_2_sspa?dchild=1&keywords=nvme-PCIe+adapter&qid=1608832943&sr=8-2-spons&psc=1&smid=A36DOQ8QSJXCYP&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzMEtDMUI4RFZMTTVJJmVuY3J5cHRlZElkPUEwMjkxMzY0MkNPNkxaV0ZHVEFEOSZlbmNyeXB0ZWRBZElkPUEwNzYyNDAxM0hXNDQ0MzFHOVBZVCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU= --Doc Savage Fairview Heights, IL
_______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org