On 03/05/2017 01:02 PM, Gregory Seidman wrote:
I have a disk that is reporting SMART errors. It is an active disk in a
(kernel, not hardware) RAID1 configuration. I also have a hot spare in the
RAID1, and md hasn't decided it should fail the disk and switch to the hot
spare. Should I proactively tell md to fail the disk (and let the hot spare
take over), or should I just wait until md notices a problem?
AFAIK desktop disks and "enterprise RAID" disks degrade differently.
When a desktop disk is having trouble reading a sector, it will retry
many times before giving up because it is likely the data does not exist
anywhere else. But, an enterprise RAID disc will retry only a few times
and then fail; because the data should exist elsewhere and hung reads
are intolerable in enterprise environments. So, if you are using
desktop disks in a RAID, you might need to manually intervene to
compensate for the mismatch.
I'm confused by "I also have a hot spare in the RAID1". Do you have a
two-member RAID1 with a hot spare, or a three-member RAID1? I would
prefer the latter:
https://manpages.debian.org/jessie/mdadm/md.4.en.html
If you're planning on buying a fourth disk and adding it after fixing
the RAID, can you add it now as a fourth RAID1 member, let it resilver,
remove the failing disk from the RAID (e.g. reconfigure as three-member
RAID1), and then pull the failing disk?
David