from:"Huon.Wilson"

Re: [SQL] hash: 64-bits and seeding

2019-03-07 Thread Huon.Wilson

Thanks for the guidance. That was my initial inclination, but I decided that consistency with the existing ‘hash’ was better. However, like you, I also prefer the specific form. I’ve opened https://issues.apache.org/jira/browse/SPARK-27099 and submitted the patch (using ‘xxhash64’) at https://g

[SQL] hash: 64-bits and seeding

2019-03-06 Thread Huon.Wilson

Hi, I’m working on something that requires deterministic randomness, i.e. a row gets the same “random” value no matter the order of the DataFrame. A seeded hash seems to be the perfect way to do this, but the existing hashes have various limitations: - hash: 32-bit output (only 4 billion possi