Re: [Qemu-devel] [RFC V6 00/33] QCOW2 deduplication core functionality

Stefan Hajnoczi Mon, 11 Feb 2013 00:11:28 -0800

On Mon, Feb 11, 2013 at 03:50:10AM +0100, Benoît Canet wrote:
> As you can read dedup keep me awake at night.
> 
> I still think that there is a need for a deduplication implementation
> that would perform nearly as fast as regular qcow2.
> 
> I though about this: http://en.wikipedia.org/wiki/Normal_distribution.
> 
> Not all block are equals for deduplication.
> Some will deduplicate well and some won't.
> 
> My idea would be to run periodically a filter on the in ram tree in order to
> drop the less performing and the less promising block.
> 
> The less performing block involved on a deduplication operation since the last
> run of the filter would be kept because they are promising so they would
> survive and have a chance to climb among the top performers.
> 
> The less performing block not involved in a deduplication operation since the
> last run of the filter would be definitively dropped from the HashNode tree
> since they are loosers.
> 
> The center of the bell curve would be kept since they are champions.
> 
> This way this ram based implementation could offer speed while it's memory 
> usage
> being limited.


This means inline dedup is opportunistic and not guaranteed to catch
every dedup.

There needs to be a trade-off between a hash's dedup score and its age.
Young hashes are allowed to stay for a while, even with low dedup scores
so they have a chance to accumulate dedups.

I still think a lookup data structure that spills to disk is better, but
perhaps you have data that shows it's reasonable to expect decent dedup
rates with the opportunistic approach?

Stefan

Re: [Qemu-devel] [RFC V6 00/33] QCOW2 deduplication core functionality

Reply via email to