Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

Galen Mon, 13 Jul 2009 08:45:11 -0700

Ross,

The disks do have problems - that's why I'm resilvering.

I've seen zero read, write or checksum errors and had it loop. Now Ido have a number of read errors on some of the disks, but I thinkresilvering is missing the point if it can't deal with corrupt data ordisks with a small amount of unreadable data.


-Galen

On Jul 13, 2009, at 6:50 AM, Ross Walker wrote:

Maybe it's the disks firmware that is bad or maybe they're jumperedfor 1.5Gbps on a 3.0 only bus? Or maybe it's a problem with the diskcable/bay/enclosure/slot?
It sounds like there is more then ZFS in the mix here. I wonder ifthe drive's status keeps flapping online/offline and either ZFS orFMA are too lax in marking a drive offline after recurring timeouts.
Take a look at your disk enclosure and iostat -En for the number oftimeouts happenning.
-Ross


On Jul 13, 2009, at 9:05 AM, Galen <gal...@zinkconsulting.com> wrote:
Ross,

I feel you here, but I don't have much of a solution.
The best I can suggest (and has been my solution) is to take outthe problematic disk, copy it to a fresh disk (preferably usingsomething like dd_rescue) and then re-install.
It seems the resilvering loop is generally a result of a faultydevice, but even if it is taken offline, you still have issues. Ihave had so many zpool resilvering loops, it's not funny. I'mrunning 2009.06 with all updates applied. I've had a very, very badbatch of disks.
I actually have a resilvering loop running right now, and I need togo copy off the offending device. Again.
I wish I had a better solution, because the zpool functions fine,no data errors, but resilvering loops forever. I love ZFS as an on-disk format. I increasingly hate the implementation of ZFS software.
-Galen

On Jul 13, 2009, at 5:34 AM, Ross wrote:
Just look at this. I thought all the restarting resilver bugswere fixed, but it looks like something odd is still happening atthe start:
Status immediately after starting resilver:

# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverableerror. Anattempt was made to correct the error. Applications areunaffected.action: Determine if the device needs to be replaced, and clearthe errors
     using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 57h3m to go
config:

     NAME              STATE     READ WRITE CKSUM
     rc-pool           DEGRADED     0     0     0
       mirror          DEGRADED     0     0     0
         c4t1d0        ONLINE       0     0     0  5.56M resilvered
         replacing     DEGRADED     0     0     0
           c4t2d0s0/o  FAULTED  1.71M 23.3M     0  too many errors
           c4t2d0      ONLINE       0     0     0  5.43M resilvered
         c5t1d0        ONLINE       0     0     0  5.55M resilvered


And a few minutes later:

# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverableerror. Anattempt was made to correct the error. Applications areunaffected.action: Determine if the device needs to be replaced, and clearthe errors
     using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 245h21m to go
config:

     NAME              STATE     READ WRITE CKSUM
     rc-pool           DEGRADED     0     0     0
       mirror          DEGRADED     0     0     0
         c4t1d0        ONLINE       0     0     0  1.10M resilvered
         replacing     DEGRADED     0     0     0
           c4t2d0s0/o  FAULTED  1.71M 23.3M     0  too many errors
           c4t2d0      ONLINE       0     0     0  824K resilvered
         c5t1d0        ONLINE       0     0     0  1.10M resilvered
It's gone from 5MB resilvered to 1MB, and increased the estimatedtime to 245 hours.
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] OpenSolaris 2008.11 - resilver still restarting

Reply via email to