I am running zfs 3 on SunOS zen 5.10 Generic_118855-33 i86pc i386 i86pc What is baffling is that the disk did come online and appear as healthy, but zpool showed the fs inconsistency. As Miles said, after the disk came back the resilver did not resume.
The only additions i have to the sequence shown are: 1) i am absolutely sure there were no disk writes in the interim since the non-global zones which use these fses were halted during the operation 2) The first time i unplugged the disk, i upgraded to a larger disk so i still have that original disk intact 3) i was afraid that zfs might resilver backwards, ie from the 22% image back to the original copy. I therefore pulled the new disk out again. Current status: # zpool status pool: external state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed with 0 errors on Sat Jun 21 07:42:03 2008 config: NAME STATE READ WRITE CKSUM external ONLINE 26.57 114 0 c12t0d0p0 ONLINE 4 114 0 mirror ONLINE 26.57 0 0 c13t0d0p0 ONLINE 55.25 4.48K 0 c16t0d0p0 ONLINE 0 0 53.14 Can i be sure that the unrecoverable error found is on the failed mirror? I was thinking of the following ways forward. Any comments most welcome: 1) run a scrub. I am thinking that kicking this off might actually corrupt data in the second vdev, so maybe starting off with 2 might be better idea... 2) physically replace disk1 with ORIGINAL disk2 and attempt a scrub justin -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Miles Nordin Sent: 21 June 2008 02:46 To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] zfs mirror broken? >>>>> "jb" == Jeff Bonwick <[EMAIL PROTECTED]> writes: jb> If you say 'zpool online <pool> <disk>' that should tell ZFS jb> that the disk is healthy again and automatically kick off a jb> resilver. jb> Of course, that should have happened automatically. with b71 I find that it does sometimes happen automatically, but the resilver isn't enough to avoid checksum errors later. Only a manually-requested scrub will stop any more checksum errors from accumulating. Also, if I reboot before one of these auto-resilvers finishes, or plug in the component that flapped while powered down, the auto-resilver never resumes. >> While one vdev was resilvering at 22% (HD replacement), the >> original disk went away so if I understand you, it happened like this: #1 #2 online online t online UNPLUG i online UNPLUG <-- filesystem writes m online UNPLUG <-- filesystem writes e online online | online resilver -> online v UNPLUG xxx online --> fs reads allowed? how? online online why no resilvering? It seems to me like DTRT after #1 is unplugged is to take the whole pool UNAVAIL until the original disk #1 comes back. When the original disk #1 drops off, the only available component left is the #2 component that flapped earlier and is being resilvered, so #2 is out-of-date and should be ignored. but I'm pretty sure ZFS doesn't work that way, right? What does it do? Will it serve incorrect, old data? Will it somehow return I/O errors for data that has changed on #1 and not been resilvered onto #2 yet?
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss