Re: [Qemu-devel] QCOW2 deduplication

Stefan Hajnoczi Thu, 28 Feb 2013 02:00:31 -0800

On Wed, Feb 27, 2013 at 05:40:53PM +0100, Kevin Wolf wrote:
> Am 27.02.2013 um 16:58 hat Benoît Canet geschrieben:
> > > > The current prototype of the QCOW2 deduplication uses 32 bytes SHA256 
> > > > or SKEIN
> > > > hashes to identify each 4KB clusters with a very low probability of 
> > > > collisions.
> > > 
> > > How do you handle the rare collision cases? Do you read the original
> > > cluster and compare the exact contents when the hashes match?
> > 
> > Stefan found a paper with the math required to compute the collision
> > probability: http://http://plan9.bell-labs.com/sys/doc/venti/venti.html
> >              (Section 3.1)
> > Doing the math for 1 Exabyte of stored data with 4KB clusters and 256 bits
> > hashes gives a probability of 2.57E-49.
> > The probability being low enough I plan to code the read/compare as an
> > option that the users would toggle.
> > The people who wrote the deduplication in ZFS have done it this way.
> 
> Fair enough. If you want to gamble with your data for some more
> performance, you can turn it off. Should we add some comptaible taint
> flag after the image has been used without collision detection?


If the verification setting is stored in the qcow2 image header then
it's essentially a taint flag.

Stefan

Re: [Qemu-devel] QCOW2 deduplication

Reply via email to