Re: [zfs-discuss] Disk keeps resilvering, was: Replacing a disk never completes

Victor Latushkin Fri, 01 Oct 2010 07:20:48 -0700

On Sep 30, 2010, at 11:00 PM, Ben Miller wrote:

> On 09/22/10 04:27 PM, Ben Miller wrote:
>> On 09/21/10 09:16 AM, Ben Miller wrote:
> 
>>> I had tried a clear a few times with no luck. I just did a detach and that
>>> did remove the old disk and has now triggered another resilver which
>>> hopefully works. I had tried a remove rather than a detach before, but that
>>> doesn't work on raidz2...
>>> 
>>> thanks,
>>> Ben
>>> 
>> I made some progress. That resilver completed with 4 errors. I cleared
>> those and still had the one error "<metadata>:<0x0>" so I started a scrub.
>> The scrub restarted the resilver on c4t0d0 again though! There currently
>> are no errors anyway, but the resilver will be running for the next day+.
>> Is this another bug or will doing a scrub eventually lead to a scrub of the
>> pool instead of the resilver?
>> 
>> Ben
> 
>       Well not much progress.  The one permanent error "<metadata>:<0x0>" 
> came back.  And the disk keeps wanting to resilver when trying to do a scrub. 
> Now after the last resilver I have more checksum errors on the pool, but not 
> on any disks:
>        NAME              STATE     READ WRITE CKSUM
>        pool2             ONLINE      0     0    37
> ...
>          raidz2-1        ONLINE      0     0    74
> 
> All other checksum totals are 0.  So three problems:
>       1. How to get the disk to stop resilvering?


This is a know bug which is fixed in build 135:

6887372 DTLs not cleared after resilver if permanent errors present

>       2. How do you get checksum errors on the pool, but no disk is 
> identified?  If I clear them and let the resilver go again more checksum 
> errors appear.  So how to get rid of these errors?

It may be not possible to determine which disk(s) is(are) responsible for 
errors, in that case you'll see 0 counter on disk level and non-zero on raidz 
level. It may mean that there's more errors that your raidz allows to recover 
from, or that data was corrupted in RAM after checksumming but before 
writing... Check your FMA data for any signs of disk issues.

>       3. How to get rid of the metadata:0x0 error?  I'm currently destroying 
> old snapshots (though that bug was fixed quite awhile ago and I'm running 
> b134).  I can try unmounting filesystems and remounting next (all are 
> currently mounted).  I can also schedule a reboot for next week if anyone 
> things that would help.

This is error in metadata, and the only way to get rid of it is to recreate 
your pool.

Regards
Victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Disk keeps resilvering, was: Replacing a disk never completes

Reply via email to