I've decided to upgrade my home server capacity by replacing the disks in one of my mirror vdevs. The procedure appeared to work out, but during resilver, a couple million checksum errors were logged on the new device. I've read through quite a bit of the archive and searched around a bit, but can not find anything definitive to ease my mind on whether to proceed.
SunOS deepthought 5.10 Generic_142901-13 i86pc i386 i86pc pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.00% done, 691h28m to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 replacing DEGRADED 215 0 0 c1t6d0s0/o FAULTED 0 0 0 corrupted data c1t6d0 ONLINE 0 0 215 3.73M resilvered c1t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 logs c8t1d0p1 ONLINE 0 0 0 cache c2t1d0p2 ONLINE 0 0 0 During the resilver, the cache device and the zil were both removed for errors (1-2k each). (Despite the c2/c8 discrepancy, they are partitions on the same OCZvertexII device.) # zpool status -xv tank pool: tank state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 9h20m with 0 errors on Sat Jun 19 22:07:27 2010 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror ONLINE 0 0 0 c1t6d0 ONLINE 0 0 2.69M 539G resilvered c1t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 logs c8t1d0p1 REMOVED 0 0 0 cache c2t1d0p2 REMOVED 0 0 0 I cleared the errors (about 5000/GB resilvered!), removed the cache device, and replaced the zil partition with the whole device. After 3 pool scrubs with no errors, I want to check with someone else that it appears okay to replace the second drive in this mirror vdev. The one thing I have not tried is a large file transfer to the server, as I am also dealing with an NFS mount problem which popped up suspiciously close to my most recent patch update. # zpool status -v tank pool: tank state: ONLINE scrub: scrub completed after 3h26m with 0 errors on Mon Jun 21 01:45:00 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 logs c0t0d0 ONLINE 0 0 0 errors: No known data errors /var/adm/messages is positively over-run with these triplets/quadruplets, not all of which end which end up as "fatal" type. Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci1043,8...@7/d...@1,0 (sd14): Jun 19 21:43:19 deepthought Error for Command: write(10) Error Level: Retryable Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Requested Block: 26721062 Error Block: 26721062 Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] ASC: 0x8 (LUN communication failure), ASCQ: 0x0, FRU: 0x0 Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci1043,8...@7/d...@1,0 (sd14): Jun 19 21:43:19 deepthought Error for Command: write(10) Error Level: Retryable Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Requested Block: 26721062 Error Block: 26721062 Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Sense Key: Aborted Command Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci1043,8...@7/d...@1,0 (sd14): Jun 19 21:43:19 deepthought Error for Command: write(10) Error Level: Fatal Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Requested Block: 26721062 Error Block: 26721062 Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice] Sense Key: Aborted Command In the past this kernel.notice ID has come up as "informational" for others, and for my case it _only_ occurred during the initial resilver. One last point of interest is the new drive is the WD Green WD10EARS, and the old are WD Green WD6400AACS (all of which I have tested on another system with the WD read-test utility). I know these drives get their share of ridicule (and occasional praise/satisfaction), but I'd appreciate any thoughts on proceeding with the mirror upgrade. [Backups are a check.] Justin _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss