Hello guys,
I've already written this on the FreeBSD forums, but so far, the
feedback is not so great - seems FreeBSD guys aren't that keen on ZFS.
I have some hopes you'll be more experienced on these kind of errors:
I have a ZFS pool comprised of two 3-disk RAIDs which I've recently
moved from OS X to FreeBSD (8 stable).
One harddisk failed last weekend with lots of shouting, SMART messages
and even a kernel panic.
I attached a new disk and started the replacement.
Unfortunately, about 20% into the replacement, a second disk in the
same RAID showed signs of misbehaviour by giving me read errors. The
resilvering did finish, though, and it left me with only three broken
files according to zpool status:
[r...@camelot /]# zpool status -v tank
pool: tank
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver completed after 10h42m with 136 errors on Tue Mar 2
07:55:05 2010
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 137 0 0
raidz1 ONLINE 0 0 0
ad17p2 ONLINE 0 0 0
ad18p2 ONLINE 0 0 0
ad20p2 ONLINE 0 0 0
raidz1 DEGRADED 326 0 0
replacing DEGRADED 0 0 0
ad16p2 OFFLINE 2 169K 6
ad4p2 ONLINE 0 0 0 839G resilvered
ad14p2 ONLINE 0 0 0 5.33G resilvered
ad15p2 ONLINE 418 0 0 5.33G resilvered
errors: Permanent errors have been detected in the following files:
tank/DVD:<0x9cd>
tank/d...@20100222225100:/Memento.m4v
tank/d...@20100222225100:/Payback.m4v
tank/d...@20100222225100:/TheManWhoWasntThere.m4v
I have the feeling the problems on ad15p2 are related to a cable
issue, since it doesn't have any SMART errors, is quite a new drive (3
months old) and was IMHO sufficiently "burned in" by repeatedly
filling it to the brim and checking the contents (via ZFS). So I'd
like to switch off the server, replace the cable and do a scrub
afterwards to make sure it doesn't produce additional errors.
Unfortunately, although it says the resilvering completed, I can't
detach ad16p2 (the first faulted disk) from the system:
[r...@camelot /]# zpool detach tank ad16p2
cannot detach ad16p2: no valid replicas
To be honest, I don't know how to proceed now. It feels like my system
is in a very unstable state right now, with a replacement not yet
finished and errors on two drives in one RAID.Z1.
I deleted the files affected, but have about 20 snapshots of this
filesystem and think these files are in most of them since they're
quite old.
So, what should I do now? Delete all snapshots? Move all other files
from this filesystem to a new filesystem and destroy the old
filesystem? Try to export and import the pool? Is it even safe to
reboot the machine right now?
I got one response in the FreeBSD Forum telling me I should reboot the
machine and do a scrub afterwards, it should then detect that it
doesn't need the old disk anymore - I am a bit reluctant doing that,
to be honest...
Any help would be appreciated.
Thank you.
Christian
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss