Thanks for the guidance. That was my initial inclination, but I decided that
consistency with the existing ‘hash’ was better. However, like you, I also
prefer the specific form.
I’ve opened https://issues.apache.org/jira/browse/SPARK-27099 and submitted the
patch (using ‘xxhash64’) at https://g
Hi,
I’m working on something that requires deterministic randomness, i.e. a row
gets the same “random” value no matter the order of the DataFrame. A seeded
hash seems to be the perfect way to do this, but the existing hashes have
various limitations:
- hash: 32-bit output (only 4 billion possi