On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote: > forgive my ignorance, but what's the advantage of this new dedup over > the existing compression option? Wouldn't full-filesystem compression > naturally de-dupe?
If you snapshot/clone as you go, then yes, dedup will do little for you because you'll already have done the deduplication via snapshots and clones. But dedup will give you that benefit even if you don't snapshot/clone all your data. Not all data can be managed hierarchically, with a single dataset at the root of a history tree. For example, suppose you want to create two VirtualBox VMs running the same guest OS, sharing as much on-disk storage as possible. Before dedup you had to: create one VM, then snapshot and clone that VM's VDI files, use an undocumented command to change the UUID in the clones, import them into VirtualBox, and setup the cloned VM using the cloned VDI files. (I know because that's how I manage my VMs; it's a pain, really.) With dedup you need only enable dedup and then install the two VMs. Clearly the dedup approach is far, far easier to use than the snapshot/clone approach. And since you can't always snapshot/clone... There are many examples where snapshot/clone isn't feasible but dedup can help. For example: mail stores (though they can do dedup at the application layer by using message IDs and hashes). For example: home directories (think of users saving documents sent via e-mail). For example: source code workspaces (ONNV, Xorg, Linux, whatever), where users might not think ahead to snapshot/clone a local clone (I also tend to maintain a local SCM clone that I then snapshot/clone to get workspaces for bug fixes and projects; it's a pain, really). I'm sure there are many, many other examples. The workspace example is particularly interesting: with the snapshot/clone approach you get to deduplicate the _source code_, but not the _object code_, while with dedup you get both dedup'ed automatically. As for compression, that helps whether you dedup or not, and it helps by about the same factor either way -- dedup and compression are unrelated, really. Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss