On Wed, 2006-06-28 at 13:24 -0400, Jonathan Edwards wrote: > > On Jun 28, 2006, at 12:32, Erik Trimble wrote: > > > The main reason I don't see ZFS mirror / HW RAID5 as useful is this: > > > > > > ZFS mirror/ RAID5: capacity = (N / 2) -1 > > > > speed << N / 2 -1 > > > > minimum # disks to lose before > > loss of data: 4 > > > > maximum # disks to lose before > > loss of data: (N / 2) + 2 > > > > > shouldn't that be capacity = ((N -1) / 2) ? > Nope. For instance, 12 drives: 2 mirrors of 6 drive RAID5, which actually has 5 drives capacity. N=12, so (12 / 2) -1 = 6 -1 = 5.
> > loss of a single disk would cause a rebuild on the R5 stripe which > could affect performance on that side of the mirror. Generally > speaking good RAID controllers will dedicate processors and channels > to calculate the parity and write it out so you're not impacted from > the host access PoV. There is a similar sort of CoW behaviour that > can happen between the array cache and the drives, but in the ideal > case you're dealing with this in dedicated hw instead of shared hw. > But, in all cases I've ever observed, even with hardware assist, writing to a N-drive RAID5 array is slower than writing to a (N-1)-drive HW Striped array. NVRAM of course can mitigate this somewhat, but the truth comes down to that RAID 5/6 always requires more work to be done than simple striping. And, a N-drive striped array will always outperform a N-drive RAID5/6 array. Always. I agree that there is some latitude for array design/cache performance/workload variance in this, but I've compared what would be the generally optimal RAID-5 workload (large size streaming writes/ streaming reads) against a identical number of striped drives, and you are looking at BEST CASE the RAID5 performing at (N-1)/N of the stripe. [ in reality, that isn't quite the best case. The best case is that RAID-5 matches striping, in the case of reads of size <= (stripe size) * (N-1) ] > > > > ZFS mirror / HW Stripe capacity = (N / 2) > > > > speed >= N / 2 > > > > minimum # disks to lose before > > loss of data: 2 > > > > maximum # disks to lose before > > loss of data: (N / 2) + 1 > > > > > > Given a reasonable number of hot-spares, I simply can't see the > > (very) marginal increase in safety give by using HW RAID5 as out > > balancing the considerable speed hit using RAID5 takes. > > > > > I think you're comparing this to software R5 or at least badly > implemented array code and divining that there is a considerable speed > hit when using R5. In practice this is not always the case provided > that the response time and interaction between the array cache and > drives is sufficient for the incoming stream. By moving your > operation to software you're now introducing more layers between the > CPU, L1/L2 cache, memory bus, and system bus before you get to the > interconnect and further latencies on the storage port and underlying > device (virtualized or not.) Ideally it would be nice to see ZFS > style improvements in array firmware, but given the state of embedded > Solaris and the predominance of 32bit controllers - I think we're > going to have some issues. We'd also need to have some sort of client > mechanism to interact with the array if we're talking about moving the > filesystem layer out there .. just a thought > > > Jon E > What I was trying to provide was the case for those using HW Arrays AND ZFS, and what the best configuration would be to do so. I'm not saying either/or; what the discussion centered around was what the best way to do BOTH is. -- Erik Trimble Java System Support Mailstop: usca14-102 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss