Yibo Cai created SPARK-50842: -------------------------------- Summary: Replace Murmur3_x86_32 with Murmur_x64_32 Key: SPARK-50842 URL: https://issues.apache.org/jira/browse/SPARK-50842 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.5 Reporter: Yibo Cai
MurmurHash3 has two variants: - x86 version generates 32 bits hash value, it processes 4 bytes in each iteration. - x64 version generates 128 bits hash value, processes 16 byes in each iteration. Spark uses [Murmur3_x86_32|https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java]. MurmurHash3 x64 runs much faster than x32 version on 64 bit platform. We can simply truncate the 128 bits hash value to 32 bits (or xor the 4 words) to be compatible with current code, without losing hashing effectiveness. Observed small yet stable performance improvement on some TPC-DS benchmarks if replace x86 Murmur hash with x64 version. Is it okay to replace x86 Murmur hash with x64 version? Any possible problem of this change? e.g., compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org