Re: [zfs-discuss] zpool status and CKSUM errors

Richard Elling Fri, 09 Jun 2006 17:56:10 -0700

Jeff Bonwick wrote:

btw: I'm really suprised how SATA disks are unreliable. I put dozen
TBs of data on ZFS last time and just after few days I got few hundreds
checksum error (there raid-z was used). And these disks are 500GB in
3511 array. Well that would explain some fsck's, etc. we saw before.


I suspect you've got a bad disk or controller.  A normal SATA drive
just won't behave this badly.  Cool that RAID-Z survives it, though.


I had a power supply go bad a few months ago (cheap PC-junk power supply)
and it trashed a bunch of my SATA and IDE disks [*] (though, happily,
not the IDE disk I scavenged from a Sun V100 :-).  The symptoms were
thousands of non-recoverable reads which were remapped until the
disks ran out of spare blocks.  Since I didn't believe this, I got a
new, more expensive, and presumably more reliable power supply.
The IDE disks faired better, but I had to do a low-level format on
the SATA drive.  All is well now and zfs hasn't shown any errors
since.  But, thunderstorm season is approaching next month...

I am also trying to collect field data which shows such failure modes
specifically looking for clusters of errors.  However, I can't promise
anything, and may not get much time to do in-depth study anytime soon.

[*] my theory is that disks are about the only devices still using
12VDC power.  Some disk vendor specify the quality of the 12VDC
supply (eg. ripple) for specific drives.  In my case, the 12VDC
was the only common-mode failure in the system which would have
trashed most of the drives in this manner.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool status and CKSUM errors

Reply via email to