On Fri, Sep 15, 2006 at 01:23:31AM -0700, can you guess? wrote: > Implementing it at the directory and file levels would be even more > flexible: redundancy strategy would no longer be tightly tied to path > location, but directories and files could themselves still inherit > defaults from the filesystem and pool when appropriate (but could be > individually handled when desirable).
The problem boils down to not having a way to express your intent that works over NFS (where you're basically limited by POSIX) that you can use from any platform (esp. ones where ZFS isn't installed). If you have some ideas, this is something we'd love to hear about. > I've never understood why redundancy was a pool characteristic in ZFS > - and the addition of 'ditto blocks' and now this new proposal (both > of which introduce completely new forms of redundancy to compensate > for the fact that pool-level redundancy doesn't satisfy some needs) > just makes me more skeptical about it. We have thought long and hard about this problem and even know how to implement it (the name we've been using is Metaslab Grids, which isn't terribly descriptive, or as Matt put it "a bag o' disks"). There are two main problems with it, though. One is failures. The problem is that you want the set of disks implementing redundancy (mirror, RAID-Z, etc.) to be spread across fault domains (controller, cable, fans, power supplies, geographic sites) as much as possible. There is no generic mechanism to obtain this information and act upon it. We could ask the administrator to supply it somehow, but such a description takes effort, is not easy, and prone to error. That's why we have the model right now where the administrator specifies how they want the disks spread out across fault groups (vdevs). The second problem comes back to accounting. If you can specify, on a per-file or per-directory basis, what kind of replication you want, how do you answer the statvfs() question? I think the recent "discussions" on this list illustrate the complexity and passion on both sides of the argument. > (Not that I intend in any way to minimize the effort it might take to > change that decision now.) The effort is not actually that great. All the hard problems we needed to solve in order to implement this were basically solved when we did the RAID-Z code. As a matter of fact, you can see it in the on-disk specification as well. In the DVA, you'll notice an 8-bit field labeled "GRID". These are the bits that would describe, on a per-block basis, what kind of redundancy we used. --Bill _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss