On 2011-Feb-07 14:22:51 +0800, Matthew Angelo <bang...@gmail.com> wrote: >I'm actually more leaning towards running a simple 7+1 RAIDZ1. >Running this with 1TB is not a problem but I just wanted to >investigate at what TB size the "scales would tip".
It's not that simple. Whilst resilver time is proportional to device size, it's far more impacted by the degree of fragmentation of the pool. And there's no 'tipping point' - it's a gradual slope so it's really up to you to decide where you want to sit on the probability curve. > I understand >RAIDZ2 protects against failures during a rebuild process. This would be its current primary purpose. > Currently, >my RAIDZ1 takes 24 hours to rebuild a failed disk, so with 2TB disks >and worse case assuming this is 2 days this is my 'exposure' time. Unless this is a write-once pool, you can probably also assume that your pool will get more fragmented over time, so by the time your pool gets to twice it's current capacity, it might well take 3 days to rebuild due to the additional fragmentation. One point I haven't seen mentioned elsewhere in this thread is that all the calculations so far have assumed that drive failures were independent. In practice, this probably isn't true. All HDD manufacturers have their "off" days - where whole batches or models of disks are cr*p and fail unexpectedly early. The WD EARS is simply a demonstration that it's WD's turn to turn out junk. Your best protection against this is to have disks from enough different batches that a batch failure won't take out your pool. PSU, fan and SATA controller failures are likely to take out multiple disks but it's far harder to include enough redundancy to handle this and your best approach is probably to have good backups. >I will be running hot (or maybe cold) spare. So I don't need to >factor in "Time it takes for a manufacture to replace the drive". In which case, the question is more whether 8-way RAIDZ1 with a hot spare (7+1+1) is better than 9-way RAIDZ2 (7+2). In the latter case, your "hot spare" is already part of the pool so you don't lose the time-to-notice plus time-to-resilver before regaining redundancy. The downside is that actively using the "hot spare" may increase the probability of it failing. -- Peter Jeremy _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss