So for a general purpose fileserver using standard SATA connectors on the 
motherboard, with no drive status LEDs for each drive, using the info above 
from myxiplx, this faulty drive replacement routine should work in the event 
that a drive fails:  (I have copy & pasted the example from myxiplx and made a 
few changes for my array/drive ids)

---------------------------

- have a cron task do a 'zpool status pool' periodically and email you if it 
detects a 'FAULTED' status using grep
- when you see the email, see which drive is faulted from the email text 
grepped from doing a 'zpool status pool | grep FAULTED' -- e.g. c1t1d0

- offline the dive with:

# zpool offline pool c1t1d0

- then identify the SATA controller that maps to this drive by running:

# cfgadm | grep Ap_Id ; cfgadm | grep c1t1d0
Ap_Id                          Type         Receptacle   Occupant     Condition
sata0/1::dsk/c1t1d0            disk         connected    configured   ok
# 

And offline it with:
# cfgadm -c unconfigure sata0/1

Verify that it is now offline with:
# cfgadm | grep sata0/1
sata0/1 disk connected unconfigured ok

Now remove and replace the disk. For my motherboard (M2N-SLI Deluxe), SATA 
controller 0/1 maps to "SATA 1" in the book -- i.e. SATA connector #1.

Bring the disk online and check its status with:
# cfgadm -c configure sata0/1
# cfgadm | grep sata0/1
sata0/1::dsk/c1t1d0 disk connected configured ok

Bring the disk back into the zfs pool. You will get a warning:
# zpool online splash c1t1d0
warning: device 'c1t1d0' onlined, but remains in faulted state

use 'zpool replace' to replace devices that are no longer present
# zpool replace pool c1t1d0

you will now see zpool status report that a resilver is in process, with detail 
as follows: (example from myxiplx's array)
(resilvering is the process whereby ZFS recreates the data on the new disk from 
redundant data: data held on the other drives in the array plus parity data)

raidz2 DEGRADED 0 0 0
spare DEGRADED 0 0 0
replacing DEGRADED 0 0 0
c5t7d0s0/o UNAVAIL 0 0 0 corrupted data
c5t7d0 ONLINE 0 0 0

Once the resilver finishes, run zpool status again and it should appear fine -- 
i.e. array and drives marked as ONLINE and no errors shown.

Note: I sometimes had to run zpool status twice to get an up to date status of 
the devices.

---------------------------

Now I need to print out this info and keep it safe for the time when a drive 
fails. Also I should print out the SATA connector mapping for each drive 
currently in my array in case I'm unable to for any reason later.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to