On Thu, Feb 9, 2012 at 2:13 PM, Roy Sigurd Karlsbakk <r...@karlsbakk.net> wrote:
>> I don't quite understand what happened in your specific case. Let's
>> say you had a setup:
>> raidz2 c1d0 c1d1 c1d2 c1d3 spare c1d4 c1d5
>>
>> Let's say c1d3 failed. Resilver started and d4 replaced d3's place -
>> you now have a non-degraded raidz2. You then physically swapped out d3
>> for a new drive and did "zpool replace". Until the replace command
>> completes, you still have the fully-functioning zpool of c1d0 c1d1
>> c1d2 c1d4. When another drive, eg. c1d2, fails, I would hope the
>> replace command is cancelled (it's cosmetic - d4 is doing fine instead
>> of d3) and instead the array is resilvered with c2d5 in place of c1d2.
>>
>> Is this what happened (other than the specific disk numbers)?
>
> What happened was this:
>
> Server Urd has four RAIDz2 VDEVs, somewhat non-optimally balanced (because of 
> a few factors, lack of time the dominant one), so the largest has 12 drives 
> (the other 7). In this VDEV, c14t19d0 died, and the common spare, c9t7d0, 
> stepped in. I replaced c14t19d0 (zpool offline, cfgadm -c unconfigure ... 
> zpool replace dpool c14t19d0 c14t19d0, zpool detach dpool c9t7d0). So, all 
> ok, resilver was almost done when c14t12d0 died and c9t7d0 took over once 
> more. Now, resilver was restarted, and is still running (high load on the 
> pool as well).

This is a side comment: you should only have run the "zpool detach
dpool c9t7d0" *after* the pool was done resilvering back onto the new
c14t19d0.


> Now, I can somewhat see the argument in resilvering more drives in parallel 
> to save time, if the drives fail at the same time, but how often do they 
> really do that? Mostly, a drive will fail rather out of sync with others. 
> This leads me to thinking it would be better to let the pool resilver the 
> first device dying and then go on with the second, or perhaps allow for 
> manual override somewhere.
>
> What are your thoughts?

I agree there is a tradeoff between letting a resilver finish and
attempting to replace the newly-failed drive asap. I would probably
set the threshold at 50% - if a current resilver is >= 50% complete,
let it finish (if possible) before working on the next drive.

I think you will get a better explanation on the (still active)
"zfs-disc...@opensolaris.org" mailing list. Someone on that list might
explain the design decision (or if it was simply an arbitrary choice).


Jan

_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to