On 08/16/10 12:37 PM, Richard Elling wrote:
On Aug 15, 2010, at 4:59 PM, Ian Collins wrote:

I look after an x4500 for a client and wee keep getting drives marked as 
degraded with just over 20 checksum errors.

Most of these errors appear to be driver or hardware related and thier frequency 
increases during a resilver, which can lead to a death spiral.  The increase in errors 
within a vdev during a resilver (I recently had three drives in an 8 drive raidz vdev 
"degraded") points to high read activity triggering the bug.

I would like to raise threshold for marking a drive degraded to give me more 
time to spot and clear the checksum errors.  Is this possible?
There is not a documented system-admin visible interface to this.
The settings in question can be set as properties in the zfs-diagnosis.conf
file, similar to props set in other FMA modules.

The source is also currently available.
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/fm/modules/common/zfs-diagnosis/zfs_de.c#957

Examples of setting FMA module properties are in
/usr/lib/fm/fmd/plugins/cpumem-retire.conf
and other .conf files.

Thanks for the links Richard.

Looking through the code, the only configurable read from the file is remove_timeout. Anything else will require code changes. Maybe it's time to upgrade the box to something newer than Solaris 10!

--
Ian.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to