On Jul 13, 2009, at 11:33 AM, Ross <no-re...@opensolaris.org> wrote:
Gaaah, looks like I spoke too soon:
$ zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.
An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 2h59m, 77.89% done, 0h50m to go
config:
NAME STATE READ WRITE CKSUM
rc-pool DEGRADED 0 0 0
mirror DEGRADED 0 0 0
c4t1d0 ONLINE 0 0 0 218M resilvered
replacing UNAVAIL 0 963K 0 insufficient
replicas
c4t2d0s0/o FAULTED 1.71M 23.4M 0 too many errors
c4t2d0 REMOVED 0 964K 0 67.0G resilvered
c5t1d0 ONLINE 0 0 0 218M resilvered
mirror ONLINE 0 0 0
c4t3d0 ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t3d0 ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c5t4d0 ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0
c4t6d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t7d0 ONLINE 0 13.0K 0
c5t6d0 ONLINE 0 0 0
c5t7d0 ONLINE 0 0 0
logs DEGRADED 0 0 0
c6d1p0 ONLINE 0 0 0
errors: No known data errors
There are a whole bunch of errors in /var/adm/messages:
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.warning] WARNING: /
p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:56:53 rob-036 Error for Command: write
(10) Error Level: Retryable
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice] Requested
Block: 83778048 Error Block: 83778048
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice] Vendor:
ATA Serial Number:
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice] Sense Key:
Aborted_Command
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice] ASC: 0x0 (no
additional sense info), ASCQ: 0x0, FRU: 0x0
Jul 13 15:57:31 rob-036 scsi: [ID 107833 kern.warning] WARNING: /
p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:57:31 rob-036 Command failed to complete...Device
is gone
Not what I would expect from a brand new drive!!
Does anybody have any tips on how i can work out where the fault
lies here? I wouldn't expect controller with so many other drives
working, and what on earth is the proper technique for replacing a
drive that failed part way through a resilver?
I really believe there is a problem with either the cabling or the
enclosure's backplane here.
Two disks is statistical coincidence, three disks means, it ain't the
disks that are bad (if you checked and there was no recall and the
firmware is correct and up to date).
Fix the real problem and the disks already in place should resilver
without further interruption.
-Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss