On Feb 16, 2010, at 4:47 PM, Christo Kutrovsky wrote: > Just finished reading the following excellent post: > > http://queue.acm.org/detail.cfm?id=1670144 > > And started thinking what would be the best long term setup for a home > server, given limited number of disk slots (say 10). > > I considered something like simply do a 2way mirror. What are the chances for > a very specific drive to fail in 2 way mirror? What if I do not want to take > that chance?
The probability of a device to fail in a time interval (T) given its MTBF (or AFR, but be careful about how the vendors publish such specs [*]) is: 1-e^(-T/MTBF) so if you have a consumer-grade disk with 700,000 hours rated MTBF, then over a time period of 1 year (8760 hours) you get: Pfailure = 1 - e^(-8760/700000) = 1.24% > I could always put "copies=2" (or more) to my important datasets and take > some risk and tolerate such a failure. +1 > But chances are, everything that is not copies=2 will have some data on those > devices, and will be lost. > > So I was thinking, how can I limit the damage, how to inject some kind of > "damage control". The problem is that MTBF measurements are only one part of the picture. Murphy's Law says something will go wrong, so also plan on backups. > One of the ideas that sparkled is have a "max devices" property for each data > set, and limit how many mirrored devices a given data set can be spread on. I > mean if you don't need the performance, you can limit (minimize) the device, > should your capacity allow this. > > Imagine this scenario: > You lost 2 disks, and unfortunately you lost the 2 sides of a mirror. Doing some simple math, and using the simple MTTDL[1] model, you can figure the probability of that happening in one year for a pair of 700k hours disks and a 24 hour MTTR as: Pfailure = 0.000086% (trust me, I've got a spreadsheet :-) > You have 2 choices to pick from: > - loose entirely Mary, Gary's and Kelly's "documents" > or > - loose a small piece of Everyone's "documents". > > This could be implement via something similar to: > read/write property "target device spread" > read only property of "achieved device spread" as this will be size dependant. > > Opinions? I use mirrors. For the important stuff, like my wife's photos and articles, I set copies=2. And I take regular backups via snapshots to multiple disks, some of which are offsite. With an appliance, like NexentaStor, it is trivial to setup a replication scheme between multiple ZFS sites. > Remember. The goal is damage control. I know 2x raidz2 offers better > protection for more capacity (altought less performance, but that's no the > point). Notes: * http://blogs.sun.com/relling/entry/awesome_disk_afr_or_is ** http://blogs.sun.com/relling/entry/a_story_of_two_mttdl -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 15-17, 2010) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss