We've been talking a lot recently about failure rates and types of
failures.  As you may know, I do look at field data and generally don't
ask the group for more data.  But this time, for various reasons (I
might have found a bug or deficiency) I'm soliciting for more data at
large.

What I'd like to gather is the error rates per bytes transferred.  This
data is collected in kstats, but is reset when you reboot.  One of the
features of my vast collection of field data is that it is often collected
rather soon after a reboot. Thus, there aren't very many bytes transferred
yet, and the corresponding error rates tend to be small (often 0).  A perfect
collection would be from a machine connected to lots of busy disks which
has been up for a very long time.

Can you help?  It is real simple.  Just email me the output of:
        kstat -pc disk
        kstat -pc device_error
for systems which have been up a while and, preferably, have lots of disks.

From this data I (or you) can calculate error rates per iops or bytes.
I'll be doing statistical analysis, so the more samples, the better it is.

Note: the context of the data is somewhat imprecise regarding specific
failure rates.  kstats are usually just counters.  Detailed per-failure
analysis requires different telemetry than kstats.  However, overall rates,
even simple counts, should be a leading indicator.

e-mail directly to me.  Thanks.
 -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to