On 01/25/11 06:52 AM, Ashley Nicholls wrote:
Hello all,

I'm having a problem that I find difficult to diagnose.

I have an IBM x3550 M3 running nexenta core platform 3.0.1 (134f) with 7x6 disk RAIDZ2 vdevs (see listing at bottom). Every day a disk fails with "Too many checksum errors", is marked as degraded and rebuilt onto a hot spare. I've been doing 'zpool detach zpool002 <degraded disk>' to remove it from the zpool and return the pools status to 'ONLINE'. Later that day (or sometimes the next day), a disk is marked as degraded due to checksum errors and is rebuilt onto a hot spare again, rinse, repeat.

We've been logging this stuff for the past few days and there are a few things to notice however: 1. The disk that fails appears to be the hot spare that we rebuilt on to the previous time 2. If I don't detach the degraded disk then the newly rebuilt hot spare does not seem to fail

I'm just doing a scrub now to confirm there are no further checksum errors and then I will detach the 'degraded' drive from the pool and see if the new hot spare fails in the next 24 hours. Just wondering if anyone had seen this before?

I used to see these all the time on a Thumper. They magically vanished when I upgraded the drive firmware.

Check to see if your drives are up to date.

--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to