On Sun, 28 Mar 2021 at 21:51, Tim via users <users@lists.fedoraproject.org> wrote:
> On Sun, 2021-03-28 at 19:30 -0300, George N. White III wrote: > > There have also been efforts to predict eminent drive failure (e.g., > > using S.M.A.R.T) but without much success. > > It took me a moment to wonder what would be famous/respected about > drive failures. ;-) But I've often wondered if SMART does anything > useful. If it detects an imminent problem it needs to notify you about > it, and with a warning that's understandable. > I have obtained a warranty replacement on the basis of the S.M.A.R.T. report. For disk-intensive processing I recommend replacing drives before the warranty expires because the rate of failures increases shortly after end-or-warranty. The price of new drives is cheap compared to to the value of lost time dealing with a drive that fails in service, and I was usually able to double the capacity of the original drive. > > I used to see system emails like this: > > The following warning/error was logged by the smartd daemon: > Device: /dev/sdb, 4 Offline uncorrectable sectors > For details see host's SYSLOG (default: /var/log/messages). > > Which were useful to me, but probably obscure to a lot of people. That > was on a system with two drives, one in use and one bodgy one for > testing, and the errors never increased over several years. It was > always consistently telling me that. > > I'm recently seeing info like this in logwatch emails: > > **Unmatched Entries** > Device: /dev/sda [SAT], CHECK POWER STATUS spins up disk (0x81 -> > 0xff) > > Which makes little sense to me. The system is a 24/7 server, not often > rebooted. It's a solid state drive, and I don't know what the hex that > means (pun intended). I've no idea if that's an error, or if it's just > telling me that drive has changed modes (idle/active). > > And I don't know what kind of warnings people get who don't have system > emails anymore. Gnome: https://developer.gnome.org/notification-spec/ uses dbus. https://sourceforge.net/projects/gsmartcontrol/ As usual, Arch has excellent documentation: https://wiki.archlinux.org/index.php/S.M.A.R.T. discusses notification strategies, including email and desktop. Temperature and flooding are the most urgent out-of-bounds conditions. There are many systems for reporting these conditions using cell-phone technology and there are USB controlled switches/relays that could be used to trigger one of these systems. > > Logically I'd expect that if SMART thought the drive might need > checking or chucking, it'd start to give me useful warnings ahead of > time, and I might be lucky enough to backup my files before disaster > struck. But the warnings ain't that useful. And, of course, it's > entirely possible for a drive to spontaneously fail before any > scheduled SMART test took place. > For me, the most common advanced warning of a drive about to fail has been users complaining that their system is too slow. This is usually accompanied by some S.M.A.R.T. evidence despite a "healthy" status report. I also seen widespread problems with older drives after a winter power outage that made left the building much colder than normal. -- George N. White III
_______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure