Hi Nicolas,
On 6/3/19, 7:48 am, "Nicolas Paris" wrote:
Hi Huon
Good catch. A 64 bit hash is definitely a useful function.
> the birthday paradox implies >50% chance of at least one for tables
larger than 77000 rows
Do you know how many rows to have 50% chances fo
Hi Huon
Good catch. A 64 bit hash is definitely a useful function.
> the birthday paradox implies >50% chance of at least one for tables larger
> than 77000 rows
Do you know how many rows to have 50% chances for a 64 bit hash ?
About the seed column, to me there is no need for such an argume
Hi,
I’m working on something that requires deterministic randomness, i.e. a row
gets the same “random” value no matter the order of the DataFrame. A seeded
hash seems to be the perfect way to do this, but the existing hashes have
various limitations:
- hash: 32-bit output (only 4 billion possi