Re: [zfs-discuss] Proposed idea for enhancement - damage control

Richard Elling Tue, 16 Feb 2010 18:29:01 -0800

On Feb 16, 2010, at 4:47 PM, Christo Kutrovsky wrote:
> Just finished reading the following excellent post:
> 
> http://queue.acm.org/detail.cfm?id=1670144
> 
> And started thinking what would be the best long term setup for a home 
> server, given limited number of disk slots (say 10).
> 
> I considered something like simply do a 2way mirror. What are the chances for 
> a very specific drive to fail in 2 way mirror? What if I do not want to take 
> that chance?


The probability of a device to fail in a time interval (T) given its MTBF (or 
AFR, but be careful about
how the vendors publish such specs [*]) is:
        1-e^(-T/MTBF)

so if you have a consumer-grade disk with 700,000 hours rated MTBF, then over a
time period of 1 year (8760 hours) you get:
        Pfailure = 1 - e^(-8760/700000) = 1.24%

> I could always put "copies=2" (or more) to my important datasets and take 
> some risk and tolerate such a failure.

+1

> But chances are, everything that is not copies=2 will have some data on those 
> devices, and will be lost.
> 
> So I was thinking, how can I limit the damage, how to inject some kind of 
> "damage control".

The problem is that MTBF measurements are only one part of the picture.
Murphy's Law says something will go wrong, so also plan on backups.

> One of the ideas that sparkled is have a "max devices" property for each data 
> set, and limit how many mirrored devices a given data set can be spread on. I 
> mean if you don't need the performance, you can limit (minimize) the device, 
> should your capacity allow this.
> 
> Imagine this scenario:
> You lost 2 disks, and unfortunately you lost the 2 sides of a mirror.

Doing some simple math, and using the simple MTTDL[1] model, you can
figure the probability of that happening in one year for a pair of 700k hours
disks and a 24 hour MTTR as:
        Pfailure =  0.000086%  (trust me, I've got a spreadsheet :-)

> You have 2 choices to pick from:
> - loose entirely Mary, Gary's and Kelly's "documents"
> or
> - loose a small piece of Everyone's "documents".
> 
> This could be implement via something similar to:
> read/write property "target device spread"
> read only property of "achieved device spread" as this will be size dependant.
> 
> Opinions? 

I use mirrors. For the important stuff, like my wife's photos and articles, I 
set 
copies=2. And I take regular backups via snapshots to multiple disks, some
of which are offsite. With an appliance, like NexentaStor, it is trivial to 
setup
a replication scheme between multiple ZFS sites.

> Remember. The goal is damage control. I know 2x raidz2 offers better 
> protection for more capacity (altought less performance, but that's no the 
> point).

Notes:
* http://blogs.sun.com/relling/entry/awesome_disk_afr_or_is
** http://blogs.sun.com/relling/entry/a_story_of_two_mttdl

 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 15-17, 2010)



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Proposed idea for enhancement - damage control

Reply via email to