Le Wednesday 02 Jan 2013 à 10:47:48 (-0800), ronnie sahlberg a écrit : > Do you really need to resolve the conflicts? > It might be easier and sufficient to just flag those hashes where a > conflict has been detected as : "dont dedup this hash anymore, > collissions have been seen."
True, that's more elegant. The user would still need to specify the verify option at creation and it would require to do a read before verify but it would not make the qcow2 format uglier. > > > On Wed, Jan 2, 2013 at 10:40 AM, Benoît Canet <benoit.ca...@irqsave.net> > wrote: > > Le Wednesday 02 Jan 2013 à 12:26:37 (-0600), Troy Benjegerdes a écrit : > >> The probability may be 'low' but it is not zero. Just because it's > >> hard to calculate the hash doesn't mean you can't do it. If your > >> input data is not random the probability of a hash collision is > >> going to get scewed. > >> > >> Read about how Bitcoin uses hashes. > >> > >> I need a budget of around $10,000 or so for some FPGAs and/or GPU cards, > >> and I can make a regression test that will create deduplication hash > >> collisions on purpose. > > > > It's not a problem as Eric pointed out while reviewing the previous patchset > > there is a small place left with zeroes on the deduplication block. > > A bit could be set on it when a collision is detected and an offset could > > point > > to a cluster used to resolve collisions. > > > >> > >> > >> On Wed, Jan 02, 2013 at 06:33:24PM +0100, Beno?t Canet wrote: > >> > > How does this code handle hash collisions, and do you have some > >> > > regression > >> > > tests that purposefully create a dedup hash collision, and verify that > >> > > the > >> > > 'right thing' happens? > >> > > >> > The two hash function that can be used are cryptographics and not broken > >> > yet. > >> > So nobody knows how to generate a collision. > >> > > >> > You can do the math to calculate the probability of collision using a > >> > 256 bit > >> > hash while processing 1EiB of data the result is so low you can consider > >> > it > >> > won't happen. > >> > The sha256 ZFS deduplication works the same way regarding collisions. > >> > > >> > I currently use qemu-io-test for testing purpose and iozone with the -w > >> > flag in > >> > the guest. > >> > I would like to find a good deduplication stress test to run in a guest. > >> > > >> > Regards > >> > > >> > Beno?t > >> > > >> > > It's great that this almost works, but it seems rather dangerous to put > >> > > something like this into the mainline code without some regression > >> > > tests. > >> > > > >> > > (I'm also suspecting the regression test will be a great way to find > >> > > flakey hardware) > >> > > > >> > > -------------------------------------------------------------------------- > >> > > Troy Benjegerdes 'da hozer' > >> > > ho...@hozed.org > >> > > > >> > > Somone asked my why I work on this free > >> > > (http://www.fsf.org/philosophy/) > >> > > software & hardware (http://q3u.be) stuff and not get a real job. > >> > > Charles Shultz had the best answer: > >> > > > >> > > "Why do musicians compose symphonies and poets write poems? They do it > >> > > because life wouldn't have any meaning for them if they didn't. That's > >> > > why > >> > > I draw cartoons. It's my life." -- Charles Shultz > >> > >> -- > >> -------------------------------------------------------------------------- > >> Troy Benjegerdes 'da hozer' ho...@hozed.org > >> > >> Somone asked my why I work on this free (http://www.fsf.org/philosophy/) > >> software & hardware (http://q3u.be) stuff and not get a real job. > >> Charles Shultz had the best answer: > >> > >> "Why do musicians compose symphonies and poets write poems? They do it > >> because life wouldn't have any meaning for them if they didn't. That's why > >> I draw cartoons. It's my life." -- Charles Shultz > >> > >