On Sep 30, 2010, at 11:00 PM, Ben Miller wrote: > On 09/22/10 04:27 PM, Ben Miller wrote: >> On 09/21/10 09:16 AM, Ben Miller wrote: > >>> I had tried a clear a few times with no luck. I just did a detach and that >>> did remove the old disk and has now triggered another resilver which >>> hopefully works. I had tried a remove rather than a detach before, but that >>> doesn't work on raidz2... >>> >>> thanks, >>> Ben >>> >> I made some progress. That resilver completed with 4 errors. I cleared >> those and still had the one error "<metadata>:<0x0>" so I started a scrub. >> The scrub restarted the resilver on c4t0d0 again though! There currently >> are no errors anyway, but the resilver will be running for the next day+. >> Is this another bug or will doing a scrub eventually lead to a scrub of the >> pool instead of the resilver? >> >> Ben > > Well not much progress. The one permanent error "<metadata>:<0x0>" > came back. And the disk keeps wanting to resilver when trying to do a scrub. > Now after the last resilver I have more checksum errors on the pool, but not > on any disks: > NAME STATE READ WRITE CKSUM > pool2 ONLINE 0 0 37 > ... > raidz2-1 ONLINE 0 0 74 > > All other checksum totals are 0. So three problems: > 1. How to get the disk to stop resilvering?
This is a know bug which is fixed in build 135: 6887372 DTLs not cleared after resilver if permanent errors present > 2. How do you get checksum errors on the pool, but no disk is > identified? If I clear them and let the resilver go again more checksum > errors appear. So how to get rid of these errors? It may be not possible to determine which disk(s) is(are) responsible for errors, in that case you'll see 0 counter on disk level and non-zero on raidz level. It may mean that there's more errors that your raidz allows to recover from, or that data was corrupted in RAM after checksumming but before writing... Check your FMA data for any signs of disk issues. > 3. How to get rid of the metadata:0x0 error? I'm currently destroying > old snapshots (though that bug was fixed quite awhile ago and I'm running > b134). I can try unmounting filesystems and remounting next (all are > currently mounted). I can also schedule a reboot for next week if anyone > things that would help. This is error in metadata, and the only way to get rid of it is to recreate your pool. Regards Victor _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss