Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Nicolas Williams
On Tue, Jan 18, 2011 at 07:16:04AM -0800, Orvar Korvar wrote: > BTW, I thought about this. What do you say? > > Assume I want to compress data and I succeed in doing so. And then I > transfer the compressed data. So all the information I transferred is > the compressed data. But, then you don't co

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Orvar Korvar
Totally Off Topic: Very interesting. Did you produce some papers on this? Where do you work? Seems very fun place to work at! BTW, I thought about this. What do you say? Assume I want to compress data and I succeed in doing so. And then I transfer the compressed data. So all the information I

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-18 Thread Orvar Korvar
"...If this is a general rule, maybe it will be worth considering using SHA512 truncated to 256 bits to get more speed..." Doesn't it need more investigation if truncating 512bit to 256bit gives equivalent security as a plain 256bit hash? Maybe truncation will introduce some bias? -- This messa

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-17 Thread Nicolas Williams
On Sat, Jan 15, 2011 at 10:19:23AM -0600, Bob Friesenhahn wrote: > On Fri, 14 Jan 2011, Peter Taps wrote: > > >Thank you for sharing the calculations. In lay terms, for Sha256, > >how many blocks of data would be needed to have one collision? > > Two. Pretty funny. In this thread some of you ar

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-15 Thread Bob Friesenhahn
On Fri, 14 Jan 2011, Peter Taps wrote: Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Two. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintai

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-15 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Peter Taps > > Thank you for sharing the calculations. In lay terms, for Sha256, how many > blocks of data would be needed to have one collision? There is no point in making a generalization a

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-15 Thread Pawel Jakub Dawidek
On Fri, Jan 14, 2011 at 11:32:58AM -0800, Peter Taps wrote: > Ed, > > Thank you for sharing the calculations. In lay terms, for Sha256, how many > blocks of data would be needed to have one collision? > > Assuming each block is 4K is size, we probably can calculate the final data > size beyond

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-14 Thread David Magda
On Jan 14, 2011, at 14:32, Peter Taps wrote: > Also, another related question. Why 256 bits was chosen and not 128 bits or > 512 bits? I guess Sha512 may be an overkill. In your formula, how many blocks > of data would be needed to have one collision using Sha128? There are two ways to get 128

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-14 Thread Peter Taps
I am posting this once again as my previous post went into the middle of the thread and may go unnoticed. Ed, Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Assuming each block is 4K is size, we probably can calc

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-14 Thread Peter Taps
Ed, Thank you for sharing the calculations. In lay terms, for Sha256, how many blocks of data would be needed to have one collision? Assuming each block is 4K is size, we probably can calculate the final data size beyond which the collision may occur. This would enable us to make the following

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-12 Thread Edward Ned Harvey
> Edward, this is OT but may I suggest you to use something like Wolfram Alpha > to perform your calculations a bit more comfortably? Wow, that's pretty awesome. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.o

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-12 Thread Enrico Maria Crisostomo
Edward, this is OT but may I suggest you to use something like Wolfram Alpha to perform your calculations a bit more comfortably? -- Enrico M. Crisostomo On Jan 12, 2011, at 4:24, Edward Ned Harvey wrote: > For anyone who still cares: > > I'm calculating the odds of a sha256 collision in an

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Edward Ned Harvey
In case you were wondering "how big is n before the probability of collision becomes remotely possible, slightly possible, or even likely?" Given a fixed probability of collision p, the formula to calculate n is: n = 0.5 + sqrt( ( 0.25 + 2*l(1-p)/l((d-1)/d) ) ) (That's just the same equation

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Edward Ned Harvey
For anyone who still cares: I'm calculating the odds of a sha256 collision in an extremely large zpool, containing 2^35 blocks of data, and no repetitions. The formula on wikipedia for the birthday problem is: p(n;d) ~= 1-( (d-1)/d )^( 0.5*n*(n-1) ) In this case, n=2^35 d=2^256 The problem is,

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Edward Ned Harvey
> From: Lassi Tuura [mailto:l...@cern.ch] > > bc -l < scale=150 > define bday(n, h) { return 1 - e(-(n^2)/(2*h)); } > bday(2^35, 2^256) > bday(2^35, 2^256) * 10^57 > EOF > > Basically, ~5.1 * 10^-57. > > Seems your number was correct, although I am not sure how you arrived at > it. The number w

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-11 Thread Lassi Tuura
Hey there, >> ~= 5.1E-57 > > Bah. My math is wrong. I was never very good at P&S. I'll ask someone at > work tomorrow to look at it and show me the folly. Wikipedia has it right, > but I can't evaluate numbers to the few-hundredth power in any calculator > that I have handy. bc -l

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > ~= 5.1E-57 Bah. My math is wrong. I was never very good at P&S. I'll ask someone at work tomorrow to look at it and show me the folly. Wikipedia has it right, but I c

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of David Magda > > Knowing exactly how the math (?) works is not necessary, but understanding Understanding the math is not necessary, but it is pretty easy. And unfortunately it becomes kind of

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
> From: Pawel Jakub Dawidek [mailto:p...@freebsd.org] > > Well, I find it quite reasonable. If your block is referenced 100 times, > it is probably quite important. If your block is referenced 1 time, it is probably quite important. Hence redundancy in the pool. > There are many corruption po

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Peter Taps > > I haven't looked at the link that talks about the probability of collision. > Intuitively, I still wonder how the chances of collision can be so low. We are > reducing a 4K block

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread David Magda
On Mon, January 10, 2011 02:41, Eric D. Mudama wrote: > On Sun, Jan 9 at 22:54, Peter Taps wrote: >> Thank you all for your help. I am the OP. >> >> I haven't looked at the link that talks about the probability of >> collision. Intuitively, I still wonder how the chances of collision >> can be so

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Pawel Jakub Dawidek
On Sat, Jan 08, 2011 at 12:59:17PM -0500, Edward Ned Harvey wrote: > Has anybody measured the cost of enabling or disabling verification? Of course there is no easy answer:) Let me explain how verification works exactly first. You try to write a block. You see that block is already in dedup tabl

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Robert Milkowski
On 01/ 8/11 05:59 PM, Edward Ned Harvey wrote: Has anybody measured the cost of enabling or disabling verification? The cost of disabling verification is an infinitesimally small number multiplied by possibly all your data. Basically lim->0 times lim->infinity. This can only be evaluated on a

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Pawel Jakub Dawidek
On Sun, Jan 09, 2011 at 07:27:52PM -0500, Edward Ned Harvey wrote: > > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > > boun...@opensolaris.org] On Behalf Of Pawel Jakub Dawidek > > > > Dedupditto doesn't work exactly that way. You can have at most 3 copies > > of your block. Ded

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Eric D. Mudama
On Sun, Jan 9 at 22:54, Peter Taps wrote: Thank you all for your help. I am the OP. I haven't looked at the link that talks about the probability of collision. Intuitively, I still wonder how the chances of collision can be so low. We are reducing a 4K block to just 256 bits. If the chances of

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Peter Taps
Thank you all for your help. I am the OP. I haven't looked at the link that talks about the probability of collision. Intuitively, I still wonder how the chances of collision can be so low. We are reducing a 4K block to just 256 bits. If the chances of collision are so low, *theoretically* it i

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Pawel Jakub Dawidek > > Dedupditto doesn't work exactly that way. You can have at most 3 copies > of your block. Dedupditto minimal value is 100. The first copy is > created on first write, the

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-09 Thread Pawel Jakub Dawidek
On Fri, Jan 07, 2011 at 03:06:26PM -0800, Brandon High wrote: > On Fri, Jan 7, 2011 at 11:33 AM, Robert Milkowski wrote: > > end-up with the block A. Now if B is relatively common in your data set you > > have a relatively big impact on many files because of one corrupted block > > (additionally f

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-08 Thread Bob Friesenhahn
On Thu, 6 Jan 2011, David Magda wrote: If you're not worried about disk read errors (and/or are not experiencing them), then you shouldn't be worried about has collisions. Except for the little problem that if there is a collision then there will always be a collision for the same data and it

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-08 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Robert Milkowski > > What if you you are storing lots of VMDKs? > One corrupted block which is shared among hundreds of VMDKs will affect > all of them. > And it might be a block containing met

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-08 Thread Robert Milkowski
On 01/ 7/11 09:02 PM, Pawel Jakub Dawidek wrote: On Fri, Jan 07, 2011 at 07:33:53PM +, Robert Milkowski wrote: Now what if block B is a meta-data block? Metadata is not deduplicated. Good point but then it depends on a perspective. What if you you are storing lots of VMDKs? One corrupte

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Brandon High
On Fri, Jan 7, 2011 at 11:33 AM, Robert Milkowski wrote: > end-up with the block A. Now if B is relatively common in your data set you > have a relatively big impact on many files because of one corrupted block > (additionally from a fs point of view this is a silent data corruption). > Without de

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Pawel Jakub Dawidek
On Fri, Jan 07, 2011 at 07:33:53PM +, Robert Milkowski wrote: > On 01/ 7/11 02:13 PM, David Magda wrote: > > > >Given the above: most people are content enough to trust Fletcher to not > >have data corruption, but are worried about SHA-256 giving 'data > >corruption' when it comes de-dupe? The

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread David Magda
On Fri, January 7, 2011 14:33, Robert Milkowski wrote: > On 01/ 7/11 02:13 PM, David Magda wrote: >> >> Given the above: most people are content enough to trust Fletcher to not >> have data corruption, but are worried about SHA-256 giving 'data >> corruption' when it comes de-dupe? The entire res

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Robert Milkowski
On 01/ 7/11 02:13 PM, David Magda wrote: Given the above: most people are content enough to trust Fletcher to not have data corruption, but are worried about SHA-256 giving 'data corruption' when it comes de-dupe? The entire rest of the computing world is content to live with 10^-15 (for SAS di

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Casper . Dik
>On Fri, January 7, 2011 01:42, Michael DeMan wrote: >> Then - there is the other side of things. The 'black swan' event. At >> some point, given percentages on a scenario like the example case above, >> one simply has to make the business justification case internally at their >> own company ab

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Nicolas Williams
On Fri, Jan 07, 2011 at 06:39:51AM -0800, Michael DeMan wrote: > On Jan 7, 2011, at 6:13 AM, David Magda wrote: > > The other thing to note is that by default (with de-dupe disabled), ZFS > > uses Fletcher checksums to prevent data corruption. Add also the fact all > > other file systems don't have

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Michael DeMan
On Jan 7, 2011, at 6:13 AM, David Magda wrote: > On Fri, January 7, 2011 01:42, Michael DeMan wrote: >> Then - there is the other side of things. The 'black swan' event. At >> some point, given percentages on a scenario like the example case above, >> one simply has to make the business justifi

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread David Magda
On Fri, January 7, 2011 01:42, Michael DeMan wrote: > Then - there is the other side of things. The 'black swan' event. At > some point, given percentages on a scenario like the example case above, > one simply has to make the business justification case internally at their > own company about wh

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread David Magda
On Fri, January 7, 2011 04:26, Darren J Moffat wrote: > On 06/01/2011 23:07, David Magda wrote: > >> Would running on recent T-series servers, which have have on-die crypto >> units, help any in this regard? > > The on chip SHA-256 implementation is not yet used see: > > http://blogs.sun.com/darren

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Bakul Shah > > See http://en.wikipedia.org/wiki/Birthday_problem -- in > particular see section 5.1 and the probability table of > section 3.4. They say "The expected number of n-bit hashes th

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Sašo Kiselkov
On 01/07/2011 01:15 PM, Darren J Moffat wrote: > On 07/01/2011 11:56, Sašo Kiselkov wrote: >> On 01/07/2011 10:26 AM, Darren J Moffat wrote: >>> On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: > Fletcher is faster than SHA-256, so I think that

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Darren J Moffat
On 07/01/2011 11:56, Sašo Kiselkov wrote: On 01/07/2011 10:26 AM, Darren J Moffat wrote: On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: "can Fletcher+Verification be f

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Sašo Kiselkov
On 01/07/2011 10:26 AM, Darren J Moffat wrote: > On 06/01/2011 23:07, David Magda wrote: >> On Jan 6, 2011, at 15:57, Nicolas Williams wrote: >> >>> Fletcher is faster than SHA-256, so I think that must be what you're >>> asking about: "can Fletcher+Verification be faster than >>> Sha256+NoVerifica

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Darren J Moffat
On 06/01/2011 23:07, David Magda wrote: On Jan 6, 2011, at 15:57, Nicolas Williams wrote: Fletcher is faster than SHA-256, so I think that must be what you're asking about: "can Fletcher+Verification be faster than Sha256+NoVerification?" Or do you have some other goal? Would running on rece

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-07 Thread Bakul Shah
On Thu, 06 Jan 2011 22:42:15 PST Michael DeMan wrote: > To be quite honest, I too am skeptical about about using de-dupe just based o > n SHA256. In prior posts it was asked that the potential adopter of the tech > nology provide the mathematical reason to NOT use SHA-256 only. However, if > O

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Michael DeMan
At the end of the day this issue essentially is about mathematical improbability versus certainty? To be quite honest, I too am skeptical about about using de-dupe just based on SHA256. In prior posts it was asked that the potential adopter of the technology provide the mathematical reason to

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Michael Sullivan
Ed, with all due respect to your math, I've seen rsync bomb due to an SHA256 collision, so I know it can and does happen. I respect my data, so even with checksumming and comparing the block size, I'll still do a comparison check if those two match. You will end up with silent data corruption

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Peter Taps > > Perhaps (Sha256+NoVerification) would work 99.99% of the time. But Append 50 more 9's on there. 99.% See below. >

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Nicolas Williams
On Thu, Jan 06, 2011 at 06:07:47PM -0500, David Magda wrote: > On Jan 6, 2011, at 15:57, Nicolas Williams wrote: > > > Fletcher is faster than SHA-256, so I think that must be what you're > > asking about: "can Fletcher+Verification be faster than > > Sha256+NoVerification?" Or do you have some o

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread David Magda
On Jan 6, 2011, at 15:57, Nicolas Williams wrote: > Fletcher is faster than SHA-256, so I think that must be what you're > asking about: "can Fletcher+Verification be faster than > Sha256+NoVerification?" Or do you have some other goal? Would running on recent T-series servers, which have have o

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Nicolas Williams
On Thu, Jan 06, 2011 at 11:44:31AM -0800, Peter Taps wrote: > I have been told that the checksum value returned by Sha256 is almost > guaranteed to be unique. All hash functions are guaranteed to have collisions [for inputs larger than their output anyways]. > In fact, if

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Richard Elling
On Jan 6, 2011, at 11:44 AM, Peter Taps wrote: > Folks, > > I have been told that the checksum value returned by Sha256 is almost > guaranteed to be unique. In fact, if Sha256 fails in some case, we have a > bigger problem such as memory corruption, etc. Essentially, adding > verification to s

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Robert Milkowski
On 01/ 6/11 07:44 PM, Peter Taps wrote: Folks, I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an overk

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread David Magda
On Thu, January 6, 2011 14:44, Peter Taps wrote: > I have been told that the checksum value returned by Sha256 is almost > guaranteed to be unique. In fact, if Sha256 fails in some case, we have a > bigger problem such as memory corruption, etc. Essentially, adding > verification to sha256 is an ov

[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Peter Taps
Folks, I have been told that the checksum value returned by Sha256 is almost guaranteed to be unique. In fact, if Sha256 fails in some case, we have a bigger problem such as memory corruption, etc. Essentially, adding verification to sha256 is an overkill. Perhaps (Sha256+NoVerification) would