Neil Perrin wrote:
On 10/22/10 15:34, Peter Taps wrote:
Folks,

Let's say I have a volume being shared over iSCSI. The dedup has been turned on.

Let's say I copy the same file twice under different names at the initiator end. Let's say each file ends up taking 5 blocks.

For dedupe to work, each block for a file must match the corresponding block from the other file. Essentially, each pair of block being compared must have the same start location into the actual data.

No, ZFS doesn't care about the file offset, just that the checksum of the blocks matches.


One conclusion is that one should be careful not to mess up file alignments when working with large files (like you might have in virtualization scenarios). I.e. if you have a bunch of virtual machine image clones, they'll dedupe quite well initially. However, if you then make seemingly minor changes inside some of those clones (like changing their partition offsets to do 1mb alignment), you'll lose most or all of the dedupe benefits.

General purpose compression tends to be less susceptible to changes in data offsets but also has its limits based on algorithm and dictionary size. I think dedupe can be viewed as a special-case of compression that happens to work quite well for certain workloads when given ample hardware resources (compared to what would be needed to run without dedupe).

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to