Ross,
The disks do have problems - that's why I'm resilvering.
I've seen zero read, write or checksum errors and had it loop. Now I
do have a number of read errors on some of the disks, but I think
resilvering is missing the point if it can't deal with corrupt data or
disks with a small amount of unreadable data.
-Galen
On Jul 13, 2009, at 6:50 AM, Ross Walker wrote:
Maybe it's the disks firmware that is bad or maybe they're jumpered
for 1.5Gbps on a 3.0 only bus? Or maybe it's a problem with the disk
cable/bay/enclosure/slot?
It sounds like there is more then ZFS in the mix here. I wonder if
the drive's status keeps flapping online/offline and either ZFS or
FMA are too lax in marking a drive offline after recurring timeouts.
Take a look at your disk enclosure and iostat -En for the number of
timeouts happenning.
-Ross
On Jul 13, 2009, at 9:05 AM, Galen <gal...@zinkconsulting.com> wrote:
Ross,
I feel you here, but I don't have much of a solution.
The best I can suggest (and has been my solution) is to take out
the problematic disk, copy it to a fresh disk (preferably using
something like dd_rescue) and then re-install.
It seems the resilvering loop is generally a result of a faulty
device, but even if it is taken offline, you still have issues. I
have had so many zpool resilvering loops, it's not funny. I'm
running 2009.06 with all updates applied. I've had a very, very bad
batch of disks.
I actually have a resilvering loop running right now, and I need to
go copy off the offending device. Again.
I wish I had a better solution, because the zpool functions fine,
no data errors, but resilvering loops forever. I love ZFS as an on-
disk format. I increasingly hate the implementation of ZFS software.
-Galen
On Jul 13, 2009, at 5:34 AM, Ross wrote:
Just look at this. I thought all the restarting resilver bugs
were fixed, but it looks like something odd is still happening at
the start:
Status immediately after starting resilver:
# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable
error. An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear
the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 57h3m to go
config:
NAME STATE READ WRITE CKSUM
rc-pool DEGRADED 0 0 0
mirror DEGRADED 0 0 0
c4t1d0 ONLINE 0 0 0 5.56M resilvered
replacing DEGRADED 0 0 0
c4t2d0s0/o FAULTED 1.71M 23.3M 0 too many errors
c4t2d0 ONLINE 0 0 0 5.43M resilvered
c5t1d0 ONLINE 0 0 0 5.55M resilvered
And a few minutes later:
# zpool status
pool: rc-pool
state: DEGRADED
status: One or more devices has experienced an unrecoverable
error. An
attempt was made to correct the error. Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear
the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress for 0h0m, 0.00% done, 245h21m to go
config:
NAME STATE READ WRITE CKSUM
rc-pool DEGRADED 0 0 0
mirror DEGRADED 0 0 0
c4t1d0 ONLINE 0 0 0 1.10M resilvered
replacing DEGRADED 0 0 0
c4t2d0s0/o FAULTED 1.71M 23.3M 0 too many errors
c4t2d0 ONLINE 0 0 0 824K resilvered
c5t1d0 ONLINE 0 0 0 1.10M resilvered
It's gone from 5MB resilvered to 1MB, and increased the estimated
time to 245 hours.
--
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss