cool.  comments below...

Kent Watsen wrote:
Richard's blog analyzes MTTDL as a function of N+P+S:
    http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl

But to understand how to best utilize an array with a fixed number of drives, I add the following constraints:
  - N+P should follow ZFS best-practice rule of N={2,4,8} and P={1,2}
  - all sets in an array should be configured similarly
  - the MTTDL for S sets is equal to (MTTDL for one set)/S

Yes, these are reasonable and will reduce the problem space, somewhat.

I got the following results by varying the NUM_BAYS parameter in the source code below:

    *_4 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 1 (2+1) w/ 1 spares providing 600 GB with MTTDL of
    5840.00 years
      - can have 1 (2+2) w/ 0 spares providing 600 GB with MTTDL of
    799350.00 years
      - can have 0 (4+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (4+2) w/ 4 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (8+1) w/ 4 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (8+2) w/ 4 spares providing 0 GB with MTTDL of Inf years

    *_8 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 2 (2+1) w/ 2 spares providing 1200 GB with MTTDL of
    2920.00 years
      - can have 2 (2+2) w/ 0 spares providing 1200 GB with MTTDL of
    399675.00 years
      - can have 1 (4+1) w/ 3 spares providing 1200 GB with MTTDL of
    1752.00 years
      - can have 1 (4+2) w/ 2 spares providing 1200 GB with MTTDL of
    2557920.00 years
      - can have 0 (8+1) w/ 8 spares providing 0 GB with MTTDL of Inf years
      - can have 0 (8+2) w/ 8 spares providing 0 GB with MTTDL of Inf years

    *_12 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 4 (2+1) w/ 0 spares providing 2400 GB with MTTDL of
    365.00 years
      - can have 3 (2+2) w/ 0 spares providing 1800 GB with MTTDL of
    266450.00 years
      - can have 2 (4+1) w/ 2 spares providing 2400 GB with MTTDL of
    876.00 years
      - can have 2 (4+2) w/ 0 spares providing 2400 GB with MTTDL of
    79935.00 years
      - can have 1 (8+1) w/ 3 spares providing 2400 GB with MTTDL of
    486.67 years
      - can have 1 (8+2) w/ 2 spares providing 2400 GB with MTTDL of
    426320.00 years

    *_16 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 5 (2+1) w/ 1 spares providing 3000 GB with MTTDL of
    1168.00 years
      - can have 4 (2+2) w/ 0 spares providing 2400 GB with MTTDL of
    199837.50 years
      - can have 3 (4+1) w/ 1 spares providing 3600 GB with MTTDL of
    584.00 years
      - can have 2 (4+2) w/ 4 spares providing 2400 GB with MTTDL of
    1278960.00 years
      - can have 1 (8+1) w/ 7 spares providing 2400 GB with MTTDL of
    486.67 years
      - can have 1 (8+2) w/ 6 spares providing 2400 GB with MTTDL of
    426320.00 years

    *_20 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 6 (2+1) w/ 2 spares providing 3600 GB with MTTDL of
    973.33 years
      - can have 5 (2+2) w/ 0 spares providing 3000 GB with MTTDL of
    159870.00 years
      - can have 4 (4+1) w/ 0 spares providing 4800 GB with MTTDL of
    109.50 years
      - can have 3 (4+2) w/ 2 spares providing 3600 GB with MTTDL of
    852640.00 years
      - can have 2 (8+1) w/ 2 spares providing 4800 GB with MTTDL of
    243.33 years
      - can have 2 (8+2) w/ 0 spares providing 4800 GB with MTTDL of
    13322.50 years

    *_24 bays w/ 300 GB drives having MTBF=4 years_*
      - can have 8 (2+1) w/ 0 spares providing 4800 GB with MTTDL of
    182.50 years
      - can have 6 (2+2) w/ 0 spares providing 3600 GB with MTTDL of
    133225.00 years
      - can have 4 (4+1) w/ 4 spares providing 4800 GB with MTTDL of
    438.00 years
      - can have 4 (4+2) w/ 0 spares providing 4800 GB with MTTDL of
    39967.50 years
      - can have 2 (8+1) w/ 6 spares providing 4800 GB with MTTDL of
    243.33 years
      - can have 2 (8+2) w/ 4 spares providing 4800 GB with MTTDL of
    213160.00 years

While its true that RAIDZ2 is /much /safer that RAIDZ, it seems that /any /RAIDZ configuration will outlive me and so I conclude that RAIDZ2 is unnecessary in a practical sense... This conclusion surprises me given the amount of attention people give to double-parity solutions - what am I overlooking?

You are overlooking statistics :-).  As I discuss in
        http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent
the MTBF (F == death) of children aged 5-14 in the US is 4,807 years, but
clearly no child will live anywhere close to 4,807 years.  The number itself
is not really that important, but it does provide a way to compare designs.
In other words, the numbers are important in a relative sense.

Another observation is that the MTBF does change over time, but the math to
consider that case is much more difficult.  It is also difficult to find any
real data or data sheets which would show that number.  There are other
techniques to model this, but they won't change the relative improvement of
raidz2 over raidz, so what you have is reasonable.

I had the fortune to hear Dave Patterson speak a few years ago.  He said
that (anecdotally) people would come up to him upset because they had lost
data with RAID-5 systems.  He said that it would have been better if he
had done RAID-6 instead of RAID-5... hindsight is always 20/20 :-)

_*Source Code*_ (compile with: cc -std:c99 -lm <filename>) [its more than 80 columns - sorry!]

This is relatively easy to implement in a spreadsheet, too.  But as you
begin to notice, there are hundreds or thousands of possible combinations
as you add disk drives.

#include <stdio.h>
#include <math.h>

#define NUM_BAYS 24
#define DRIVE_SIZE_GB 300
#define MTBF_YEARS 4

I think this is pessimistic :-)

#define MTTR_HOURS_NO_SPARE 16

I think this is optimistic :-)

#define MTTR_HOURS_SPARE 4

I also think this is optimistic and pessimistic :-)  With ZFS, you only
recover the data used.  This is an advantage over other LVMs which try to
reconstruct the whole space.  In the best case, this number is almost zero
and in the worst case, it is approximately the same as an LVM with aggressive
reconstruction.  There is a possibility that it is worse for some use cases,
but that has not been characterized yet.

Since you restrict the types to raidz and raidz2, you can simplify the
analysis a bit, which helps.

int main() {

    printf("\n");
printf("%u bays w/ %u GB drives having MTBF=%u years\n", NUM_BAYS, DRIVE_SIZE_GB, MTBF_YEARS);
    for (int num_drives=2; num_drives<=8; num_drives*=2) {
        for (int num_parity=1; num_parity<=2; num_parity++) {
            double  mttdl;

            int     mtbf_hours          = MTBF_YEARS * 365 * 24;
            int     total_num_drives    = num_drives + num_parity;
            int     num_instances       = NUM_BAYS / total_num_drives;
            int     num_spares          = NUM_BAYS % total_num_drives;
double mttr = num_spares==0 ? MTTR_HOURS_NO_SPARE : MTTR_HOURS_SPARE; int total_capacity = num_drives * num_instances * DRIVE_SIZE_GB;

            if (num_parity==1) {
mttdl = pow(mtbf_hours, 2.0) / (total_num_drives * (total_num_drives-1) * mttr );
            } else if (num_parity==2) {
mttdl = pow(mtbf_hours, 3.0) / (total_num_drives * (total_num_drives-1) * (total_num_drives-2) * pow(mttr, 2.
            }

printf(" - can have %u (%u+%u) w/ %u spares providing %u GB with MTTDL of %.2f years\n",
                    num_instances,
                    num_drives, num_parity,
                    num_spares,
                    total_capacity,
                    mttdl/24/365/num_instances
                );
        }
    }
}

There are many more facets of looking at these sorts of analysis, which is
why I wrote RAIDoptimizer.  Attached is a similar output from RAIDoptimizer
in a spreadsheet so you can sort or plot the data as you'd like.  The
algorithms are described in various blog entries at:
        http://blogs.sun.com/relling

I'll note that RAIDoptimizer doesn't currently let me set an MTBF < 100,000
hours, so I'll take that as an RFE.
 -- richard

Attachment: raidz-raidz2-MTTDL-example.ods
Description: application/vnd.oasis.opendocument.spreadsheet

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to