Yes I did mean 6+2, Thank you for fixing the typo. I'm actually more leaning towards running a simple 7+1 RAIDZ1. Running this with 1TB is not a problem but I just wanted to investigate at what TB size the "scales would tip". I understand RAIDZ2 protects against failures during a rebuild process. Currently, my RAIDZ1 takes 24 hours to rebuild a failed disk, so with 2TB disks and worse case assuming this is 2 days this is my 'exposure' time.
For example, I would hazard a confident guess that 7+1 RAIDZ1 with 6TB drives wouldn't be a smart idea. I'm just trying to extrapolate down. I will be running hot (or maybe cold) spare. So I don't need to factor in "Time it takes for a manufacture to replace the drive". On Mon, Feb 7, 2011 at 2:48 PM, Edward Ned Harvey <opensolarisisdeadlongliveopensola...@nedharvey.com> wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Matthew Angelo >> >> My question is, how do I determine which of the following zpool and >> vdev configuration I should run to maximize space whilst mitigating >> rebuild failure risk? >> >> 1. 2x RAIDZ(3+1) vdev >> 2. 1x RAIDZ(7+1) vdev >> 3. 1x RAIDZ2(6+2) vdev >> >> I just want to prove I shouldn't run a plain old RAID5 (RAIDZ) with 8x >> 2TB disks. > > (Corrected type-o, 6+2 for you). > Sounds like you made up your mind already. Nothing wrong with that. You > are apparently uncomfortable running with only 1 disk worth of redundancy. > There is nothing fundamentally wrong with the raidz1 configuration, but the > probability of failure is obviously higher. > > Question is how do you calculate the probability? Because if we're talking > abou 5e-21 versus 3e-19 then you probably don't care about the difference... > They're both essentially zero probability... Well... There's no good > answer to that. > > With the cited probability of bit error rate, you're just representing the > probability of a bit error. You're not representing the probability of a > failed drive. And you're not representing the probability of a drive > failure within a specified time window. What you really care about is the > probability of two drives (or 3 drives) failing concurrently... In which > case, you need to model the probability of any one drive failing within a > specified time window. And even if you want to model that probability, in > reality it's not linear. The probability of a drive failing between 1yr and > 1yr+3hrs is smaller than the probability of the drive failing between 3yr > and 3yr+3hrs. Because after 3yrs, the failure rate will be higher. So > after 3 yrs, the probability of multiple simultaneous failures is higher. > > I recently saw some seagate data sheets which specified the annual disk > failure rate to be 0.3%. Again, this is a linear model, representing a > nonlinear reality. > > Suppose one disk fails... How many weeks does it take to get a replacement > onsite under the 3yr limited mail-in warranty? > > But then again after 3 years, you're probably considering this your antique > hardware, and all the stuff you care about is on a newer server. Etc. > > There's no good answer to your question. > > You are obviously uncomfortable with a single disk worth of redundancy. Go > with your gut. Sleep well at night. It only costs you $100. You probably > have a cell phone with no backups worth more than that in your pocket right > now. > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss