On 2-Nov-09, at 3:16 PM, Nicolas Williams wrote:

On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote:
forgive my ignorance, but what's the advantage of this new dedup over
the existing compression option? Wouldn't full-filesystem compression
naturally de-dupe?
...
There are many examples where snapshot/clone isn't feasible but dedup
can help.  For example: mail stores (though they can do dedup at the
application layer by using message IDs and hashes).  For example: home
directories (think of users saving documents sent via e-mail).  For
example: source code workspaces (ONNV, Xorg, Linux, whatever), where
users might not think ahead to snapshot/clone a local clone (I also tend
to maintain a local SCM clone that I then snapshot/clone to get
workspaces for bug fixes and projects; it's a pain, really).  I'm sure
there are many, many other examples.

A couple that come to mind... Some patterns become much cheaper with dedup:

- The Subversion working copy format where you have the reference checked out file alongside the working file - QA/testing system where you might have dozens or hundreds of builds of iterations an application, mostly identical

Exposing checksum metadata might have interesting implications for operations like diff, cmp, rsync, even tar.

--Toby


The workspace example is particularly interesting: with the
snapshot/clone approach you get to deduplicate the _source code_, but
not the _object code_, while with dedup you get both dedup'ed
automatically.

As for compression, that helps whether you dedup or not, and it helps by about the same factor either way -- dedup and compression are unrelated,
really.

Nico
--
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to