> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of David Magda
>
> Knowing exactly how the math (?) works is not necessary, but understanding

Understanding the math is not necessary, but it is pretty easy.  And
unfortunately it becomes kind of necessary because even when you tell
somebody the odds of a collision are a zillion times smaller than the odds
of our Sun exploding and destroying Earth, they still don't believe you.

The explanation of the math, again, is described in the wikipedia article
"Birthday Problem," or stated a little more simply here:

Given a finite pool of N items, pick one at random and return it to the
pool.
Pick another one.  The odds of it being the same as the first are 1/N.
Pick another one.  The odds of it being the same as the first are 1/N, and
the odds of it being the same as the 2nd are 1/N.  So the odds of it
matching any of the prior picks are 2/N.
Pick another one.  The odds of it being the same as any previous pick are
3/N.

If you repeatedly draw M items out of the pool (plus the first draw),
returning them each time, then the odds of any draw matching any other draw
are:
P = 1/N + 2/N +3/N + ... + M/N
P = ( sum(1 to M) ) / N

Note:  If you google for "sum positive integers," you'll find sum(1 to N) =
N * (N+1) / 2

P = M * (M+1) / 2N

In the context of hash collisions in a zpool, M would be the number of data
blocks in your zpool, and N would be all the possible hashes.  A sha-256
hash has 256 bits, so N = 2^256

I described an excessively large worst-case zpool in my other email, which
had 2^35 data blocks in it.  So...
M = 2^35

So the probability of any block hash colliding with any other hash in that
case is
2^35 * (2^35+1) / (2*2^256)
= ( 2^70 + 2^35 ) * 2^-257
= 2^-187 + 2^-222
~= 5.1E-57

There are estimated 8.87 E 49 atoms in planet Earth.  (
http://pages.prodigy.net/jhonig/bignum/qaearth.html )

The probability of a collision in your worst-case unrealistic dataset as
described, is even 100 million times less likely than randomly finding a
single specific atom in the whole planet Earth by pure luck.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to