To follow up on this issue, at one point the stats were down to this: extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b da0 0.0 0.0 0.0 0.0 0 0.0 0 da1 0.0 0.0 0.0 0.0 0 0.0 0 da2 127.9 0.0 202.3 0.0 1 47.5 100 da3 125.9 0.0 189.3 0.0 1 43.1 97 da4 127.9 0.0 189.8 0.0 1 45.8 100 da5 128.9 0.0 206.3 0.0 0 42.5 99 da6 127.9 0.0 202.3 0.0 1 46.2 98 da7 0.0 249.7 0.0 334.2 10 39.5 100
At some point, I figured out that 125 random iops is pretty much the limit for 7200 RPM SATA drives. So mostly what we're looking at here is the resilver of a raidz2 is the pathological worst case. Lesson learned; raidz2 is just really not viable without some kind of sort on the resilver operations. Wish I understood ZFS well enough to do something about that, but research suggests the problem is non-trivial. :( There also seems to be a separate ZFS issue related to having a very large number of snapshots (e.g. hourly for several months on a couple of filesystems). Some combination of the OS updates we've been doing trying to get this machine to 9.2-RC1 and deleting a ton of snapshots. It would be nice to know which it was; I guess we'll find out in a few months. So it seems like the combination of these two issues is mostly what is/was plaguing us. Thanks! _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"