cool. comments below... Kent Watsen wrote:
Richard's blog analyzes MTTDL as a function of N+P+S: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdlBut to understand how to best utilize an array with a fixed number of drives, I add the following constraints:- N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2} - all sets in an array should be configured similarly - the MTTDL for S sets is equal to (MTTDL for one set)/S
Yes, these are reasonable and will reduce the problem space, somewhat.
I got the following results by varying the NUM_BAYS parameter in the source code below:*_4 bays w/ 300 GB drives having MTBF=4 years_* - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of 5840.00 years - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of 799350.00 years - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years *_8 bays w/ 300 GB drives having MTBF=4 years_* - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of 2920.00 years - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of 399675.00 years - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of 1752.00 years - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of 2557920.00 years - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years *_12 bays w/ 300 GB drives having MTBF=4 years_* - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of 365.00 years - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of 266450.00 years - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of 876.00 years - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of 79935.00 years - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of 426320.00 years *_16 bays w/ 300 GB drives having MTBF=4 years_* - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of 1168.00 years - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of 199837.50 years - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of 584.00 years - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of 1278960.00 years - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of 486.67 years - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of 426320.00 years *_20 bays w/ 300 GB drives having MTBF=4 years_* - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of 973.33 years - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of 159870.00 years - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of 109.50 years - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of 852640.00 years - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of 13322.50 years *_24 bays w/ 300 GB drives having MTBF=4 years_* - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of 182.50 years - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of 133225.00 years - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of 438.00 years - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of 39967.50 years - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of 243.33 years - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of 213160.00 yearsWhile its true that RAIDZ2 is /much /safer that RAIDZ, it seems that /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 is unnecessary in a practical sense... This conclusion surprises me given the amount of attention people give to double-parity solutions - what am I overlooking?
You are overlooking statistics :-). As I discuss in http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but clearly no child will live anywhere close to 4,807 years. The number itself is not really that important, but it does provide a way to compare designs. In other words, the numbers are important in a relative sense. Another observation is that the MTBF does change over time, but the math to consider that case is much more difficult. It is also difficult to find any real data or data sheets which would show that number. There are other techniques to model this, but they won't change the relative improvement of raidz2 over raidz, so what you have is reasonable. I had the fortune to hear Dave Patterson speak a few years ago. He said that (anecdotally) people would come up to him upset because they had lost data with RAID-5 systems. He said that it would have been better if he had done RAID-6 instead of RAID-5... hindsight is always 20/20 :-)
_*Source Code*_ (compile with: cc -std:c99 -lm <filename>) [its more than 80 columns - sorry!]
This is relatively easy to implement in a spreadsheet, too. But as you begin to notice, there are hundreds or thousands of possible combinations as you add disk drives.
#include <stdio.h> #include <math.h> #define NUM_BAYS 24 #define DRIVE_SIZE_GB 300 #define MTBF_YEARS 4
I think this is pessimistic :-)
#define MTTR_HOURS_NO_SPARE 16
I think this is optimistic :-)
#define MTTR_HOURS_SPARE 4
I also think this is optimistic and pessimistic :-) With ZFS, you only recover the data used. This is an advantage over other LVMs which try to reconstruct the whole space. In the best case, this number is almost zero and in the worst case, it is approximately the same as an LVM with aggressive reconstruction. There is a possibility that it is worse for some use cases, but that has not been characterized yet. Since you restrict the types to raidz and raidz2, you can simplify the analysis a bit, which helps.
int main() { printf("\n");printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS, DRIVE_SIZE_GB, MTBF_YEARS);for (int num_drives=2; num_drives<=8; num_drives*=2) { for (int num_parity=1; num_parity<=2; num_parity++) { double mttdl; int mtbf_hours = MTBF_YEARS * 365 * 24; int total_num_drives = num_drives + num_parity; int num_instances = NUM_BAYS / total_num_drives; int num_spares = NUM_BAYS % total_num_drives;double mttr = num_spares==0 ? MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE; int total_capacity = num_drives * num_instances * DRIVE_SIZE_GB;if (num_parity==1) {mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * (total_num_drives-1) * mttr );} else if (num_parity==2) {mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * (total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2.}printf(" - can have %u (%u+%u) w/ %u spares providing %u GB with MTTDL of %.2f years\n",num_instances, num_drives, num_parity, num_spares, total_capacity, mttdl/24/365/num_instances ); } } }
There are many more facets of looking at these sorts of analysis, which is why I wrote RAIDoptimizer. Attached is a similar output from RAIDoptimizer in a spreadsheet so you can sort or plot the data as you'd like. The algorithms are described in various blog entries at: http://blogs.sun.com/relling I'll note that RAIDoptimizer doesn't currently let me set an MTBF < 100,000 hours, so I'll take that as an RFE. -- richard
raidz-raidz2-MTTDL-example.ods
Description: application/vnd.oasis.opendocument.spreadsheet
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss