On Oct 8, 2010, at 4:33 AM, Edward Ned Harvey wrote: >> From: Peter Jeremy [mailto:peter.jer...@alcatel-lucent.com] >> Sent: Thursday, October 07, 2010 10:02 PM >> >> On 2010-Oct-08 09:07:34 +0800, Edward Ned Harvey <sh...@nedharvey.com> >> wrote: >>> If you're going raidz3, with 7 disks, then you might as well just make >>> mirrors instead, and eliminate the slow resilver. >> >> There is a difference in reliability: raidzN means _any_ N disks can >> fail, whereas mirror means one disk in each mirror pair can fail. >> With a mirror, Murphy's Law says that the second disk to fail will be >> the pair of the first disk :-). > > Maybe. But in reality, you're just guessing the probability of a single > failure, the probability of multiple failures, and the probability of > multiple failures within the critical time window and critical redundancy > set. > > The probability of a 2nd failure within the critical time window is smaller > whenever the critical time window is decreased, and the probability of that > failure being within the critical redundancy set is smaller whenever your > critical redundancy set is smaller. So if raidz2 takes twice as long to > resilver than a mirror, and has a larger critical redundancy set, then you > haven't gained any probable resiliency over a mirror. > > Although it's true with mirrors, it's possible for 2 disks to fail and > result in loss of pool, I think the probability of that happening is smaller > than the probability of a 3-disk failure in the raidz2. > > How much longer does a 7-disk raidz2 take to resilver as compared to a > mirror? According to my calculations, it's in the vicinity of 10x longer. >
This article has been posted elsewhere, is about 10 months old, but is a good read: http://queue.acm.org/detail.cfm?id=1670144 Really, there should be a ballpark / back of the napkin formula to be able to calculate this? I've been curious about this too, so here goes a 1st cut... DR = disk reliability, in terms of chance of the disk dying in any given time period, say any given hour? DFW = disk full write - time to write every sector on the disk. This will vary depending on system load, but is still an input item that can be determined by some testing. RSM = resilver time for a mirror of two of the given disks RSZ1 = resilver time for raidz1 vdev of two of the given disks? RSZ2 = resilver time for raidz2 vdev of two of the given disks? chances of losing all data in a mirror: DLM = RSM * DR. chances of losing all data in a raiz1: DLRZ1 = RSZ1 * DR. chances of losing all data in a raidz2: DLRZ2 = RSZ2 * DR * DR Now, for the above, I'll make some other assumptions... Lets just guess at a 1-year MTBF for our disks, and for purposes here, just flat line that at a failure rate of chance per hour throughout the year. Lets presume rebuilding a mirror takes one hour. Lets presume that a 7-disk raidz1 takes 24 times longer to rebuild one disk than a mirror, I think this would be a 'safe' ratio to the benefit of the mirror. Lets presume that a 7-disk raidz2 takes 72 times longer to rebuild one disk than a mirror, this should be 'safe' and again benefit to the mirror. DR for a one hour period = 1 / 24 hours / 365 day = .000114 - chance a disk might die in any given hour. DLM = one hour * DR = .000114 DLRZ1 = 24 hours * DR = .0001114 * 6 ( x6 because there are six more drives in the pool, and any one of them could fail) DLRZ2 = 72 hours * DR * DR = (72 * (.0001114 * 6-disks) * (.0001114 * 5 disks) = a much tinier chance of losing all that data. A better way to think about it maybe.... Based on our 1-year flat-line MTBF for disks, to figure out how much faster the mirror must rebuild for reliability to be the same as a raidz2... DLM = DLRZ2 .0001114 * 1 hour = X hours * (.0001114 * 6-disks) * (.0001114 * 5 disks) X = (.0001114 * 6-disks) * 5 X = .003342 So, the mirror would have to resilver three hundred times faster than the raiz2 (1 / .003342) in order for it to offer the same levels of reliability in regards to the chances of losing the entire vdev due to additional disk failures during a resilver? The governing thing here is that O(2) level of reliability based on expected chances of failure of additional disks during any given moment in time, vs. O(1) for mirrors and raidz1? Note that the above is O(2) for raidz2 and O(1) for mirror/raidz1, because we are working on the assumption we have already lost one disk. With raidz3, we would have ( 1 / (.0001114 * 4-disks remaining in pool ), or about 2,000 times more reliability? Now, the above does not include things like proper statistics that the chances of that 2nd and 3rd disk failing (even correlations) may be higher than our 'flat-line' %/hr. based on 1-year MTBF, or stuff like if all the disks were purchased in the same lots and at the same time, so their chances of failing around the same time is higher, etc. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss