Hello,
I am starting this thread so we can discuss of the choice of a good key/value
store for the QCOW2 deduplication.
One of the main goal is to keep the ratio between the number of cluster written
and the number of dedup metadata io high.
I initially though about taking the first two stages
Hi,
This is with reference to the deduplication patch for qcow2 image.(
http://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02811.html)
I applied the patch and the code compiled without any error.
I converted a raw image to qcow2 image using the usual qemu-img convert
command. Then i created
Am 28.02.2013 um 10:59 hat Stefan Hajnoczi geschrieben:
> On Wed, Feb 27, 2013 at 05:40:53PM +0100, Kevin Wolf wrote:
> > Am 27.02.2013 um 16:58 hat Benoît Canet geschrieben:
> > > > > The current prototype of the QCOW2 deduplication uses 32 bytes SHA256
> > > > > or SKEIN
> > > > > hashes to iden
On Wed, Feb 27, 2013 at 05:40:53PM +0100, Kevin Wolf wrote:
> Am 27.02.2013 um 16:58 hat Benoît Canet geschrieben:
> > > > The current prototype of the QCOW2 deduplication uses 32 bytes SHA256
> > > > or SKEIN
> > > > hashes to identify each 4KB clusters with a very low probability of
> > > > col
Am 27.02.2013 um 16:58 hat Benoît Canet geschrieben:
> > > The current prototype of the QCOW2 deduplication uses 32 bytes SHA256 or
> > > SKEIN
> > > hashes to identify each 4KB clusters with a very low probability of
> > > collisions.
> >
> > How do you handle the rare collision cases? Do you r
> > The current prototype of the QCOW2 deduplication uses 32 bytes SHA256 or
> > SKEIN
> > hashes to identify each 4KB clusters with a very low probability of
> > collisions.
>
> How do you handle the rare collision cases? Do you read the original
> cluster and compare the exact contents when th
Am 26.02.2013 um 18:14 hat Benoît Canet geschrieben:
>
> Hello Kevin,
>
> As you are best person to discuss QCOW2 implementations issues with I am
> writing
> this mail so you can know what has been done on deduplication and what I am
> planning to do next.
>
> In short I need your feedback bef
Hello Kevin,
As you are best person to discuss QCOW2 implementations issues with I am writing
this mail so you can know what has been done on deduplication and what I am
planning to do next.
In short I need your feedback before going into another code sprint and being in
need of another code rev
On Thu, Jan 10, 2013 at 4:18 PM, Benoît Canet wrote:
>> Now I understand. This case covers overwriting existing data with new
>> contents. That is common :).
>>
>> But are you seeing a cluster with refcount > 1 being overwritten
>> often? If so, it's worth looking into why that happens. It may
> Now I understand. This case covers overwriting existing data with new
> contents. That is common :).
>
> But are you seeing a cluster with refcount > 1 being overwritten
> often? If so, it's worth looking into why that happens. It may be a
> common pattern for certain file systems or applica
On Wed, Jan 9, 2013 at 5:40 PM, Benoît Canet wrote:
>> > I.5) cluster removal
>> > When a L2 entry to a cluster become stale the qcow2 code decrement the
>> > refcount.
>> > When the refcount reach zero the L2 hash block of the stale cluster
>> > is written to clear the hash.
>> > This happen ofte
On Wed, Jan 9, 2013 at 5:32 PM, Eric Blake wrote:
> On 01/09/2013 09:16 AM, Stefan Hajnoczi wrote:
>
>>> I.6) max refcount reached
>>> The L2 hash block of the cluster is written in order to remember at next
>>> startup
>>> that it must not be used anymore for deduplication. The hash is dropped
> > Two GTrees are used to give access to the hashes : one indexed by hash and
> > one other indexed by physical offset.
>
> What is the GTree indexed by physical offset used for?
I think I can get rid of the second GTree for ram based deduplication.
It need to:
-Start qcow2 with the deduplicati
On Wed, Jan 9, 2013 at 4:24 PM, Benoît Canet wrote:
> Here is a mail to open a discussion on QCOW2 deduplication design and
> performance.
>
> The actual deduplication strategy is RAM based.
> One of the goal of the project is to plan and implement an alternative way to
> do
> the lookups from di
>
> What is the GTree indexed by physical offset used for?
It's used for two things: deletion and loading of the hashes.
-Deletion is a hook in the refcount code that trigger when zero is reached.
the only information the code got is the physical offset of the yet to discard
cluster. The hash m
On 01/09/2013 09:16 AM, Stefan Hajnoczi wrote:
>> I.6) max refcount reached
>> The L2 hash block of the cluster is written in order to remember at next
>> startup
>> that it must not be used anymore for deduplication. The hash is dropped from
>> the
>> gtrees.
>
> Interesting case. This means
Hello,
Here is a mail to open a discussion on QCOW2 deduplication design and
performance.
The actual deduplication strategy is RAM based.
One of the goal of the project is to plan and implement an alternative way to do
the lookups from disk for bigger images.
I will in a first section enumerate
17 matches
Mail list logo