Yibo Cai created SPARK-50842:
--------------------------------

             Summary: Replace Murmur3_x86_32 with Murmur_x64_32
                 Key: SPARK-50842
                 URL: https://issues.apache.org/jira/browse/SPARK-50842
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.5.5
            Reporter: Yibo Cai


MurmurHash3 has two variants:
- x86 version generates 32 bits hash value, it processes 4 bytes in each 
iteration.
- x64 version generates 128 bits hash value, processes 16 byes in each 
iteration.

Spark uses 
[Murmur3_x86_32|https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java].
 MurmurHash3 x64 runs much faster than x32 version on 64 bit platform. We can 
simply truncate the 128 bits hash value to 32 bits (or xor the 4 words) to be 
compatible with current code, without losing hashing effectiveness.

Observed small yet stable performance improvement on some TPC-DS benchmarks if 
replace x86 Murmur hash with x64 version.

Is it okay to replace x86 Murmur hash with x64 version? Any possible problem of 
this change? e.g., compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to