On Wed, 13 Jan 2021 at 05:41, Sreyan Chakravarty <sreya...@gmail.com> wrote:
> On Tue, Jan 12, 2021 at 9:16 AM Chris Murphy <li...@colorremedies.com> > wrote: > > > > > > -x has more information that might be relevant including firmware > > revision and some additional logs for recent drive reported errors > > which usually are benign. But might be clues. > > > > These two attributes I'm not familiar with > > 187 Reported_Uncorrect 0x0032 100 096 000 Old_age > > Always - 4294967301 > > 188 Command_Timeout 0x0032 100 100 000 Old_age > > Always - 98785820672 > > > > But the value is well above threshold for both so I'm not worried about > it. > > > > > > Here is the output of: > > # smartctl -Ax /dev/sda > > https://pastebin.com/raw/GrgrQrSf > > I have no idea what it means. > You are not alone. Most people stop reading at the line: SMART overall-health self-assessment test result: PASSED Before retiring I worked in remote sensing, which is a data-intensive activity. HDD failures were a major issue. One sure way to kill a drive was to start a batch job that filled a disk and then kept hammering the drive over a long weekend when I was off somewhere without network access. I could usually get warranty replacements for failed drives by submitting the smartctrl reports. We use XFS starting on SGI IRIX and then on linux when it became available, with striped arrays for thruput with I/O bound processes. XFS was designed to avoid lengthy filesystem repair times, so getting a system back after a drive failure just meant waiting for the tape robot to find and restore the backup tapes. HDD's are mechanical so subject to wear. With heavy use they tend to die shortly after end-or-warranty. I started replacing drives at end-or-warranty which, along with measures to reduce runaway batch jobs, greatly reduced the number of failures. Your drive has been used for 1671 hours, and 1491 power-on cycles. Mechanical device wear is often highest at startup, so this is probably getting close to the design lifetime of a consumer laptop HDD. There are workloads (image processing, numerical modelling) where recovering the work done since the last backup just means restarting a batch job and is generally easier than trying to repair a filesystem with a bunch of partially written HDF5 files. Given the age of your HDD, I would replace it. If your laptop came with Windows, you should be able to install Windows 10 on a small partition in order to upgrade the BIOS and maybe run the drive vendor's diagnostics. You may want to revisit your choices of drive technology, filesystem, backup and recovery strategy, etc. with your use case in mind. > This is the problem with SMART tests, they are so esoteric that it is > difficult for a common user to make sense of it. > > Let me know what you think, if you see any glaring faults. > > You are to be commended for helping the btrfs developers investigate one of the rare situations that make filesystems such a hard problem. My experience indicates your HDD is involved, either by old age or some BIOS or drive firmware glitch, so your best way forward is to make sure your BIOS is current and replace the drive with one suited to your use case. -- George N. White III
_______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org