On Apr 8, 2010, at 9:06 PM, Daniel Carosone wrote: > On Thu, Apr 08, 2010 at 08:36:43PM -0700, Richard Elling wrote: >> On Apr 8, 2010, at 6:19 PM, Daniel Carosone wrote: >>> >>> As for error rates, this is something zfs should not be afraid >>> of. Indeed, many of us would be happy to get drives with less internal >>> ECC overhead and complexity for greater capacity, and tolerate the >>> resultant higher error rates, specifically for use with zfs (sector >>> errors, not overall drive failure, of course). Even if it means I >>> need raidz4, and wind up with the same overall usable space, I may >>> prefer the redundancy across drives rather than within. >> >> Disagree. Reliability trumps availability every time. > > Often, but not sure about every.
I am quite sure. > The economics shift around too fast > for such truisms to be reliable, and there's always room for an > upstart (often in a niche) to make great economic advantages out of > questioning this established wisdom. The oft-touted example is > google's servers, but there are many others. A small change in reliability for massively parallel systems has a significant, multiplicative effect on the overall system. Companies like Google weigh many factors, including component reliability, when designing the systems. > >> And the problem >> with the availability provided by redundancy techniques is that the >> amount of work needed to recover is increasing. This work is limited >> by latency and HDDs are not winning any latency competitions anymore. > > We're talking about generalities; the niche can be very important to > enable these kinds of tricks by holding some of the other troubling > variables constant (e.g. application/programming platform). It > doesn't really matter whether you're talking about 1 dual-PSU server > vs 2 single-PSU servers, or whole datacentres - except that solid > large-scale diversity tends to lessen your concentration (and perhaps > spend) on internal redundancy within a datacentre (or disk). > > Put another way: some application niches are much more able to adopt > redundancy techniques that don't require so much work. At the other extreme, if disks were truly reliable, the only RAID that would matter is RAID-0. > Again, for the google example: if you're big and diverse enough that > shifting load between data centres on failure is no work, then > moving the load for other reasons is viable too - such as moving > to where it's night time and power and cooling are cheaper. The work > has been done once, up front, and the benefits are repeatable. Most folks never even get to a decent disaster recovery design, let alone a full datacenter mirror :-( >> To combat this, some vendors are moving to an overprovision model. >> Current products deliver multiple "disks" in a single FRU with builtin, >> fine-grained redundancy. Because the size and scope of the FRU is >> bounded, the recovery can be optimized and the reliability of the FRU >> is increased. > > That's not new. Past examplees in the direct experience of this > community include the BladeStor and SSA-1000 storage units, which > aggregated disks into failure domains (e.g. drawers) for a (big) > density win. Nope. The FRUs for BladeStor and SSA-100 were traditional disks. To see something different you need to rethink the "disk" -- something like a Xiotech ISE. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss