In general, you may need to keep data in one dataset if it is somehow related (i.e. backup of a specific machine or program, a user's home, etc) and if you plan to manage it in a consistent manner. For example, CIFS shares can not be nested, so for a unitary share (like "distribs") you would probably want one dataset. Also you can only have hardlinks within one FS dataset, so if you manage different views into a distribution set (i.e. sorted by vendor or sorted by software type) and if you do it by hardlinks - you need one dataset as well. If you often move (link and unlink) files around, i.e. from an "incoming" directory to final storage, you may want or not want to have that "incoming" in the same dataset, this depends on some other considerations too. You want to split datasets when you need them to have different features and perhaps different uses, i.e. to have them as separate shares, to enforce separate quotas and reservations, perhaps to delegate administration to particular OS users (i.e. let a user manage snapshots of his own homedir) and/or local zones. Don't forget about individual dataset properties (i.e. you may want compression for source code files but not for a multimedia collection), snapshots and clones, etc. > 2. space management (we have wasted space in some pools while others > are starved) Well, that's a reason to decrease number of pools, but not datasets ;) > 3. tool speed > > I do not have good numbers for time to do > some of these operations > as we are down to under 200 datasets (1/3 of the way through the > migration to the new layout). I do have log entries that point to > about a minute to complete a `zfs list` operation. > > > Would I run into any problems when snapshots are taken (almost) > > simultaneously from multiple filesystems at once? > > Our logs show snapshot creation time at 2 > seconds or less, but we > do not try to do them all at once, we walk the list of datasets and > process (snapshot and replicate) each in turn.
I can partially relate to that. We have a Thumper system running OpenSolaris SXCE snv_177, with a separate dataset for each user's home directory, for backups of each individual remote machine, for each VM image, each local zone, etc. - in particular as to have separate history of snapshots and possibility to clone what we need to. Its relatively many filesystems (about 350) are or are not a problem depending on the tool used. For example, a typical import of the main pool may take up to 8 minutes when in safe mode, but many of delays seem to be related to attempts to share_nfs and share_cifs while the network is down ;) Auto-snapshots are on, and listing them is indeed rather long: [root@thumper ~]# time zfs list -tall -r pond | wc -l 56528 real 0m18.146s user 0m7.360s sys 0m10.084s [root@thumper ~]# time zfs list -tvolume -r pond | wc -l 5 real 0m0.096s user 0m0.025s sys 0m0.073s [root@thumper ~]# time zfs list -tfilesystem -r pond | wc -l 353 real 0m0.123s user 0m0.052s sys 0m0.073s Some operations like listing the filesystems SEEM slow due to the terminal, but in fact are rather quick: [root@thumper ~]# time df -k | wc -l 363 real 0m2.104s user 0m0.094s sys 0m0.183s However low-level system programs may have problems with multiple FSes; one known troublemaker is LiveUpgrade. Jens Elkner published a wonderful set of patches for Solaris 10 and OpenSolaris to limit LU's interests to just the filesystems that the admin knows to be interesting for the OS upgrade (they also fix mount order and other known bugs of that LU software release): * http://iws.cs.uni-magdeburg.de/~elkner/luc/lutrouble.html True, 10000 FSes is not something I would have seen, so some tools (especially legacy ones) may break at the sheer amount of mountpoints :) One of my own tricks for cleaning snapshots, i.e. to free up pool space starvation quickly, is to use parallel "zfs destroy" invokations like this (note the ampersand): # zfs list -t snapshot -r pond/export/home/user | grep @zfs-auto-snap | awk '{print $1}' | \ while read Z ; do zfs destroy "$Z" & done This may spawn several thousand processes (if called for the root dataset), but they often complete in just 1-2 minutes instead of hours for a one-by-one series of calls; I guess because this way many ZFS metadata operations are requested in a small timeframe and get coalesced into few big writes.
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss