Re: [Qemu-devel] QCOW2 deduplication design

2013-01-10 Thread Stefan Hajnoczi
On Thu, Jan 10, 2013 at 4:18 PM, Benoît Canet wrote: >> Now I understand. This case covers overwriting existing data with new >> contents. That is common :). >> >> But are you seeing a cluster with refcount > 1 being overwritten >> often? If so, it's worth looking into why that happens. It may

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-10 Thread Benoît Canet
> Now I understand. This case covers overwriting existing data with new > contents. That is common :). > > But are you seeing a cluster with refcount > 1 being overwritten > often? If so, it's worth looking into why that happens. It may be a > common pattern for certain file systems or applica

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-10 Thread Stefan Hajnoczi
On Wed, Jan 9, 2013 at 5:40 PM, Benoît Canet wrote: >> > I.5) cluster removal >> > When a L2 entry to a cluster become stale the qcow2 code decrement the >> > refcount. >> > When the refcount reach zero the L2 hash block of the stale cluster >> > is written to clear the hash. >> > This happen ofte

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Stefan Hajnoczi
On Wed, Jan 9, 2013 at 5:32 PM, Eric Blake wrote: > On 01/09/2013 09:16 AM, Stefan Hajnoczi wrote: > >>> I.6) max refcount reached >>> The L2 hash block of the cluster is written in order to remember at next >>> startup >>> that it must not be used anymore for deduplication. The hash is dropped

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Benoît Canet
> > Two GTrees are used to give access to the hashes : one indexed by hash and > > one other indexed by physical offset. > > What is the GTree indexed by physical offset used for? I think I can get rid of the second GTree for ram based deduplication. It need to: -Start qcow2 with the deduplicati

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Stefan Hajnoczi
On Wed, Jan 9, 2013 at 4:24 PM, Benoît Canet wrote: > Here is a mail to open a discussion on QCOW2 deduplication design and > performance. > > The actual deduplication strategy is RAM based. > One of the goal of the project is to plan and implement an alternative way to > do > the lookups from di

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Benoît Canet
> > What is the GTree indexed by physical offset used for? It's used for two things: deletion and loading of the hashes. -Deletion is a hook in the refcount code that trigger when zero is reached. the only information the code got is the physical offset of the yet to discard cluster. The hash m

Re: [Qemu-devel] QCOW2 deduplication design

2013-01-09 Thread Eric Blake
On 01/09/2013 09:16 AM, Stefan Hajnoczi wrote: >> I.6) max refcount reached >> The L2 hash block of the cluster is written in order to remember at next >> startup >> that it must not be used anymore for deduplication. The hash is dropped from >> the >> gtrees. > > Interesting case. This means