Bug#588965: marked as forwarded (Please add support for replacing a failing but still usable drive with a spare without marking the first drive as failed)

Debian Bug Tracking System Tue, 13 Jul 2010 21:18:20 -0700

Your message dated Wed, 14 Jul 2010 06:16:08 +0200
with message-id <[email protected]>
has caused the   report #588965,
regarding Please add support for replacing a failing but still usable drive 
with a spare without marking the first drive as failed
to be marked as having been forwarded to the upstream software
author(s) [email protected]


(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
588965: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=588965
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Dear Neil, the following suggestion seems sensible to me. What do
you think?

thanks,

----- Forwarded message from Andras Korn <[email protected]> -----

Date: Tue, 13 Jul 2010 22:51:42 +0200
From: Andras Korn <[email protected]>
X-EqTo: [email protected]
To: Debian Bug Tracking System <[email protected]>
Subject: Bug#588965: Please add support for replacing a failing but still
        usable drive with a spare without marking the first drive as failed
Message-ID: <[email protected]> 
(sfid-20100713_230018_403676_06D1D418)
X-Debian-PR-Package: mdadm
X-Spam: no (crm114:23.94 SA:-2.6)

Package: mdadm
Version: 3.1.2-2
Severity: wishlist
Tags: upstream

Hi,

Especially in the case of RAID5 arrays it would often be life-saving to be
able to activate a hot-spare and prepare to replace a live drive with it,
without marking that drive as failed first.

Consider the following scenario. Let's say we have a RAID5 array composed of
sdb, sdc and sdd, with sde added as a spare (i.e. 3 active drives).

sdc starts to noticeably fail. Unknown to the user, sdd also has developed a
bad sector. The user marks sdc as failed and waits for sde to be synced;
however, during the resync, the system hits the bad sector on sdd, causing
sdd to also be marked as failed, the resync to fail and the array to become
unusable. (The same can happen if an intermittent bit error occurs during
the resync operation.)

The algorithm I'd like to see implemented would work as follows:

sdc starts to noticeably fail. The user marks it for replacement. sde is
activated and the system copies everything from sdc to sde, using the
redundancy provided by the other drives if/when necessary. Temporarily,
while this operation is in progress, sdc and sde are both active and in the
same slot; any writes that hit the array get committed to both. When sde is
completely up to date, sdc gets deactivated and marked as failed. The bad
sector on sdd doesn't compromise our ability to sync the hotspare. At this
point, another spare could be added, sdd marked for replacement, and so on.

I realise this also requires changes to the kernel. Apologies if it's
already planned; I haven't seen it discussed anywhere.

Best regards,

Andras

-- 
 .''`.   martin f. krafft <[email protected]>      Related projects:
: :'  :  proud Debian developer               http://debiansystem.info
`. `'`   http://people.debian.org/~madduck    http://vcs-pkg.org
  `-  Debian - when you have better things to do than fixing systems

digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)

--- End Message ---

Bug#588965: marked as forwarded (Please add support for replacing a failing but still usable drive with a spare without marking the first drive as failed)

Reply via email to