Neil Perrin wrote:
On 10/22/10 15:34, Peter Taps wrote:
Folks,
Let's say I have a volume being shared over iSCSI. The dedup has been
turned on.
Let's say I copy the same file twice under different names at the
initiator end. Let's say each file ends up taking 5 blocks.
For dedupe to work, each block for a file must match the
corresponding block from the other file. Essentially, each pair of
block being compared must have the same start location into the
actual data.
No, ZFS doesn't care about the file offset, just that the checksum of
the blocks matches.
One conclusion is that one should be careful not to mess up file
alignments when working with large files (like you might have in
virtualization scenarios). I.e. if you have a bunch of virtual machine
image clones, they'll dedupe quite well initially. However, if you then
make seemingly minor changes inside some of those clones (like changing
their partition offsets to do 1mb alignment), you'll lose most or all of
the dedupe benefits.
General purpose compression tends to be less susceptible to changes in
data offsets but also has its limits based on algorithm and dictionary
size. I think dedupe can be viewed as a special-case of compression
that happens to work quite well for certain workloads when given ample
hardware resources (compared to what would be needed to run without dedupe).
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss