On 2011-Feb-07 14:22:51 +0800, Matthew Angelo <bang...@gmail.com> wrote:
>I'm actually more leaning towards running a simple 7+1 RAIDZ1.
>Running this with 1TB is not a problem but I just wanted to
>investigate at what TB size the "scales would tip".

It's not that simple.  Whilst resilver time is proportional to device
size, it's far more impacted by the degree of fragmentation of the
pool.  And there's no 'tipping point' - it's a gradual slope so it's
really up to you to decide where you want to sit on the probability
curve.

>   I understand
>RAIDZ2 protects against failures during a rebuild process.

This would be its current primary purpose.

>  Currently,
>my RAIDZ1 takes 24 hours to rebuild a failed disk, so with 2TB disks
>and worse case assuming this is 2 days this is my 'exposure' time.

Unless this is a write-once pool, you can probably also assume that
your pool will get more fragmented over time, so by the time your
pool gets to twice it's current capacity, it might well take 3 days
to rebuild due to the additional fragmentation.

One point I haven't seen mentioned elsewhere in this thread is that
all the calculations so far have assumed that drive failures were
independent.  In practice, this probably isn't true.  All HDD
manufacturers have their "off" days - where whole batches or models of
disks are cr*p and fail unexpectedly early.  The WD EARS is simply a
demonstration that it's WD's turn to turn out junk.  Your best
protection against this is to have disks from enough different batches
that a batch failure won't take out your pool.

PSU, fan and SATA controller failures are likely to take out multiple
disks but it's far harder to include enough redundancy to handle this
and your best approach is probably to have good backups.

>I will be running hot (or maybe cold) spare.  So I don't need to
>factor in "Time it takes for a manufacture to replace the drive".

In which case, the question is more whether 8-way RAIDZ1 with a
hot spare (7+1+1) is better than 9-way RAIDZ2 (7+2).  In the latter
case, your "hot spare" is already part of the pool so you don't
lose the time-to-notice plus time-to-resilver before regaining
redundancy.  The downside is that actively using the "hot spare"
may increase the probability of it failing.

-- 
Peter Jeremy
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to