Gaaah, looks like I spoke too soon:

$ zpool status
  pool: rc-pool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 2h59m, 77.89% done, 0h50m to go
config:

        NAME              STATE     READ WRITE CKSUM
        rc-pool           DEGRADED     0     0     0
          mirror          DEGRADED     0     0     0
            c4t1d0        ONLINE       0     0     0  218M resilvered
            replacing     UNAVAIL      0  963K     0  insufficient replicas
              c4t2d0s0/o  FAULTED  1.71M 23.4M     0  too many errors
              c4t2d0      REMOVED      0  964K     0  67.0G resilvered
            c5t1d0        ONLINE       0     0     0  218M resilvered
          mirror          ONLINE       0     0     0
            c4t3d0        ONLINE       0     0     0
            c5t2d0        ONLINE       0     0     0
            c5t0d0        ONLINE       0     0     0
          mirror          ONLINE       0     0     0
            c5t3d0        ONLINE       0     0     0
            c4t5d0        ONLINE       0     0     0
            c4t4d0        ONLINE       0     0     0
          mirror          ONLINE       0     0     0
            c5t4d0        ONLINE       0     0     0
            c5t5d0        ONLINE       0     0     0
            c4t6d0        ONLINE       0     0     0
          mirror          ONLINE       0     0     0
            c4t7d0        ONLINE       0 13.0K     0
            c5t6d0        ONLINE       0     0     0
            c5t7d0        ONLINE       0     0     0
        logs              DEGRADED     0     0     0
          c6d1p0          ONLINE       0     0     0

errors: No known data errors


There are a whole bunch of errors in /var/adm/messages:

Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.warning] WARNING: 
/p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:56:53 rob-036         Error for Command: write(10)               
Error Level: Retryable
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Requested Block: 
83778048                  Error Block: 83778048
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Vendor: ATA             
                   Serial Number:             
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   Sense Key: 
Aborted_Command
Jul 13 15:56:53 rob-036 scsi: [ID 107833 kern.notice]   ASC: 0x0 (no additional 
sense info), ASCQ: 0x0, FRU: 0x0


Jul 13 15:57:31 rob-036 scsi: [ID 107833 kern.warning] WARNING: 
/p...@1,0/pci1022,7...@1/pci11ab,1...@2/d...@2,0 (sd3):
Jul 13 15:57:31 rob-036         Command failed to complete...Device is gone


Not what I would expect from a brand new drive!!

Does anybody have any tips on how i can work out where the fault lies here?  I 
wouldn't expect controller with so many other drives working, and what on earth 
is the proper technique for replacing a drive that failed part way through a 
resilver?
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to