://github.com/apache/spark/pull/24019.
- Huon
From: Reynold Xin
Date: Thursday, 7 March 2019 at 6:33 pm
To: "Wilson, Huon (Data61, Eveleigh ATP)"
Cc: "dev@spark.apache.org"
Subject: Re: [SQL] hash: 64-bits and seeding
Rather than calling it hash64, it'd be better to j
Rather than calling it hash64, it'd be better to just call it xxhash64. The
reason being ten years from now, we probably would look back and laugh at a
specific hash implementation. It'd be better to just name the expression what
it is.
On Wed, Mar 06, 2019 at 7:59 PM, < huon.wil...@data61.csir
Hi,
I’m working on something that requires deterministic randomness, i.e. a row
gets the same “random” value no matter the order of the DataFrame. A seeded
hash seems to be the perfect way to do this, but the existing hashes have
various limitations:
- hash: 32-bit output (only 4 billion possi