On 1/28/2011 1:48 PM, Nicolas Williams wrote:
On Fri, Jan 28, 2011 at 01:38:11PM -0800, Igor P wrote:
I created a zfs pool with dedup with the following settings:
zpool create data c8t1d0
zfs create data/shared
zfs set dedup=on data/shared

The thing I was wondering about was it seems like ZFS only dedup at
the file level and not the block. When I make multiple copies of a
file to the store I see an increase in the deup ratio, but when I copy
similar files the ratio stays at 1.00x.
Dedup is done at the block level, not file level.  "Similar files" does
not mean that they actually share common blocks.  You'll have to look
more closely to determine if they do.

Nico

What Nico said.

The big reason here is that blocks have to be ALIGNED on the same block boundaries to be dedup'd.

That is, if I have a file which contains:

AAABBCCCCCCDD

if I have 4-character wide blocks, then if I copy the file, and append an "X" to the above file, making it look like:

XAAABBCCCCCCDD


There will be NO DEDUP in that case.

This is what trips people up most of the time - they see "similar" files, but don't realize that "similar" for dedup has to mean aligned on block boundaries, not just "I've got the same 3k of data in both files".


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to