On Thu, Jan 10, 2013 at 4:18 PM, Benoît Canet <benoit.ca...@irqsave.net> wrote: >> Now I understand. This case covers overwriting existing data with new >> contents. That is common :). >> >> But are you seeing a cluster with refcount > 1 being overwritten >> often? If so, it's worth looking into why that happens. It may be a >> common pattern for certain file systems or applications to write >> initial data 'A' first and then change it later. This actually >> suggests against online dedup, or at least for something like qcow2 >> delayed write where we don't "commit" yet because the guest will >> probably still modify or append to the data. > > I apologize for the bogus former information. > > The deduplication metrics accounting code was confusing the delete cluster > operation with the more common hash removal from tree operation. > After fixing the metrics code commons files manipulations on the guest only > generate a few delete cluster operations. > > The cases where a lots of cluster are deleted is when the image is overwritten > with zeroes and reformating a partition with ext3.
Eric raised a good point with zero detection. Maybe we should turn it on when dedup is enabled since we'll be scanning the data buffer anyway. It places special zero cluster markers in the L2 entry and never stores zeroes at all. The advantage of doing this is that we don't dedup zero clusters and never hit the refcount limits on this relatively common operation. Stefan