On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote:
> forgive my ignorance, but what's the advantage of this new dedup over  
> the existing compression option?  Wouldn't full-filesystem compression  
> naturally de-dupe?

If you snapshot/clone as you go, then yes, dedup will do little for you
because you'll already have done the deduplication via snapshots and
clones.  But dedup will give you that benefit even if you don't
snapshot/clone all your data.  Not all data can be managed
hierarchically, with a single dataset at the root of a history tree.

For example, suppose you want to create two VirtualBox VMs running the
same guest OS, sharing as much on-disk storage as possible.  Before
dedup you had to: create one VM, then snapshot and clone that VM's VDI
files, use an undocumented command to change the UUID in the clones,
import them into VirtualBox, and setup the cloned VM using the cloned
VDI files.  (I know because that's how I manage my VMs; it's a pain,
really.)  With dedup you need only enable dedup and then install the two
VMs.

Clearly the dedup approach is far, far easier to use than the
snapshot/clone approach.  And since you can't always snapshot/clone...

There are many examples where snapshot/clone isn't feasible but dedup
can help.  For example: mail stores (though they can do dedup at the
application layer by using message IDs and hashes).  For example: home
directories (think of users saving documents sent via e-mail).  For
example: source code workspaces (ONNV, Xorg, Linux, whatever), where
users might not think ahead to snapshot/clone a local clone (I also tend
to maintain a local SCM clone that I then snapshot/clone to get
workspaces for bug fixes and projects; it's a pain, really).  I'm sure
there are many, many other examples.

The workspace example is particularly interesting: with the
snapshot/clone approach you get to deduplicate the _source code_, but
not the _object code_, while with dedup you get both dedup'ed
automatically.

As for compression, that helps whether you dedup or not, and it helps by
about the same factor either way -- dedup and compression are unrelated,
really.

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to