On Wed, Apr 17, 2013 at 5:38 AM, Edward Ned Harvey (openindiana) < openindi...@nedharvey.com> wrote:
> > From: Sašo Kiselkov [mailto:skiselkov...@gmail.com] > > > > Raid-Z indeed does stripe data across all > > leaf vdevs (minus parity) and does so by splitting the logical block up > > into equally sized portions. > > Jay, there you have it. You asked why use mirrors, and you said you would > use raidz2 or raidz3 unless cpu overhead is too much. I recommended using > mirrors and avoiding raidzN, and here is the answer why. > > If you have 16 disks arranged in 8x mirrors, versus 10 disks in raidz2 > which stripes across 8 disks plus 2 parity disks, then the serial write of > each configuration is about the same; that is, 8x the sustained write speed > of a single device. But if you have two or more parallel sequential read > threads, then the sequential read speed of the mirrors will be 16x while > the raidz2 is only 8x. The mirror configuration can do 8x random write > while the raidz2 is only 1x. And the mirror can do 16x random read while > the raidz2 is only 1x. > It (finally) occurs to me that not all mirrors are created equal. I've been assuming, and probably ignoring hints to the contrary, that what was being compared here was a raid-z2 configuraton with a 2-way mirror composed of two 8-disk vdevs. I now realize you're talking about 8 separate 2-disk mirrors organized into a pool. "mirror x1 y1 mirror x2 y2 mirror x3 y3..." I also realize that almost every discussion I've seen online concerning mirrors proposes organizing the drives in the way I was thinking about it (which is probably why I was thinking that way). I suppose this is something different that zfs brings to the table when compared to more conventional hardware raid. > > In the case you care about the least, they're equal. In the case you care > about most, the mirror configuration is 16x faster. > > You also said the raidz2 will offer more protection against failure, > because you can survive any two disk failures (but no more.) I would argue > this is incorrect (I've done the probability analysis before). Mostly > because the resilver time in the mirror configuration is 8x to 16x faster > (there's 1/8 as much data to resilver, and IOPS is limited by a single > disk, not the "worst" of several disks, which introduces another factor up > to 2x, increasing the 8x as high as 16x), so the smaller resilver window > means lower probability of "concurrent" failures on the critical vdev. > We're talking about 12 hours versus 1 week, actual result of my machines > in production. Also, while it's possible to fault the pool with only 2 > failures in the mirror configuration, the probability is against that > happening. The first disk failure probability is 1/16 for each disk ... > And then if you have a 2nd concurrent failure, there's a 14/15 probability > that it occurs on a separately independent (safe) mirror. The 3rd > concurrent failure 12/14 chance of being safe. The 4th concurrent failure > 10/13 chance of being safe. Etc. The mirror configuration can probably > withstand a higher number of failures, and also the resilver window for > each failure is smaller. When you look at the total probability of pool > failure, they were both like 10^-17 or something like that. In other > words, we're splitting hairs but as long as we are, we might as well point > out that they're both about the same. > This also starts to make a lot more sense. Confused the hell out of me the first three times I read it. I'm going to have to ponder this a bit more as my thinking has been heavily influenced by the more conventional mirror arrangement. _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss