On 01/25/11 06:52 AM, Ashley Nicholls wrote:
Hello all,
I'm having a problem that I find difficult to diagnose.
I have an IBM x3550 M3 running nexenta core platform 3.0.1 (134f) with
7x6 disk RAIDZ2 vdevs (see listing at bottom).
Every day a disk fails with "Too many checksum errors", is marked as
degraded and rebuilt onto a hot spare. I've been doing 'zpool detach
zpool002 <degraded disk>' to remove it from the zpool and return the
pools status to 'ONLINE'. Later that day (or sometimes the next day),
a disk is marked as degraded due to checksum errors and is rebuilt
onto a hot spare again, rinse, repeat.
We've been logging this stuff for the past few days and there are a
few things to notice however:
1. The disk that fails appears to be the hot spare that we rebuilt on
to the previous time
2. If I don't detach the degraded disk then the newly rebuilt hot
spare does not seem to fail
I'm just doing a scrub now to confirm there are no further checksum
errors and then I will detach the 'degraded' drive from the pool and
see if the new hot spare fails in the next 24 hours. Just wondering if
anyone had seen this before?
I used to see these all the time on a Thumper. They magically vanished
when I upgraded the drive firmware.
Check to see if your drives are up to date.
--
Ian.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss