On Thu, Apr 08, 2010 at 03:48:54PM -0700, Erik Trimble wrote: > Well....
To be clear, I don't disagree with you; in fact for a specific part of the market (at least) and a large part of your commentary, I agree. I just think you're overstating the case for the rest. > The problem is (and this isn't just a ZFS issue) that resilver and scrub > times /are/ very bad for >1TB disks. This goes directly to the problem > of redundancy - if you don't really care about resilver/scrub issues, > then you really shouldn't bother to use Raidz or mirroring. It's pretty > much in the same ballpark. Sure, and that's why you have raidz3 now; also why multi-way mirrors are getting more attention, as the drives are getting large enough that capacities and redundancies previously only available via raidz constructions can now be had with mirrors and a reasonable number of spindles. Large drives (with the constraints you describe) certainly change the deployment scenarios. I don't agree that they shouldn't be deployed at all, ever - which seems to be what you're saying. Take 6x1TB in raidz2, replace with 6x2TB in three-way-mirror. Chances are, you've just improved performance. I'm just trying to show it's really not all that black and white. As for error rates, this is something zfs should not be afraid of. Indeed, many of us would be happy to get drives with less internal ECC overhead and complexity for greater capacity, and tolerate the resultant higher error rates, specifically for use with zfs (sector errors, not overall drive failure, of course). Even if it means I need raidz4, and wind up with the same overall usable space, I may prefer the redundancy across drives rather than within. > That is, >1TB 3.5" drives have such long resilver/scrub times that with > ZFS, it's a good bet you can kill a second (or third) drive before you > can scrub or resilver in time to compensate for the already-failed one. > Put it another way, you get more errors before you have time to fix the > old ones, which effectively means you now can't fix errors before they > become permanent. Permanent errors = data loss. Again, potential zfs improvements could help here: - resilver in parallel for multiply redundant vdevs with multiple failures/replacements (currently, I think resilver restarts in this case?) - scrub a (top level) vdev at a time, rather than a whole pool. If I know I'm about to replace a drive, perhaps for capacity upgrade, I'll scrub first to minimise the chances of tripping over a latent error, especially on the previous drive i just replaced. No need to scrub other vdevs right now. - scrub/resilver selectively by dataset, to allow higher priority data to be given better protection. > For example, the 2TB 5900RPM 3.5" drives are (on average) over 2x as > slow as the 1TB 7200RPM 3.5" drives for most operations. Access time is > slower by 40%, and throughput is slower on by 30-50%. Please, be fair and compare like with like - say replacing 5400rpm 1TB drives. Your same problem would apply if replacing 1TB 7200's with 1TB 5400's; it has little to do with the capacity. Indeed, at the same rpm, the higher density has the potential to be faster. > In any case, resilver/scrub times are becoming the dominant factor in > reliability of these large drives. Agreed; I'd argue they have been for some time (ie, even at the 1TB size). > As a practical matter, small setups are for the most part not > expandable/upgradable much, if at all. Buy what you need now, and plan > on rebuying something new in 5-10 years, but don't think that what you > put together now can be continuously upgraded for a decade. On this, I agree completely, even on a shorter time-scale (say 3-5 years). On each generation, repurpose the previous generation for backup or something else as appropriate. This applies to drives, and to the boxes that house them. Even so, leave yourself wiggle room for upgrades and other unanticipated devlopments in the meantime where you can. -- Dan.
pgpLw78wUivGj.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss