[ 
https://issues.apache.org/jira/browse/KAFKA-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass updated KAFKA-10650:
----------------------------------------
    Labels: cloudera  (was: )

> Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap
> -----------------------------------------------------
>
>                 Key: KAFKA-10650
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10650
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Viktor Somogyi-Vass
>            Assignee: Viktor Somogyi-Vass
>            Priority: Major
>              Labels: cloudera
>         Attachments: benchmark-evidence.png, benchmark-run-output
>
>
> The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal 
> Information Processing Standards) verification.
> While MD5 isn't a FIPS incompatibility here as it isn't used for 
> cryptographic purposes, I spent some time with this as it isn't ideal either. 
> MD5 is a relatively fast crypto hashing algo but there are much better 
> performing algorithms for hash tables as it's used in SkimpyOffsetMap.
> By applying Murmur3 (that is implemented in Streams) I could achieve a 3x 
> faster {{put}} operation and the overall segment cleaning sped up by 30% 
> while preserving the same collision rate (both performed within 0.0015 - 
> 0.007, mostly with 0.004 median).
> The usage of Murmur3 was decided as research paper [1] shows Murmur2 is 
> relatively a good choice for hash tables. Based on this Since Murmur3 is 
> available in the project I used that. 
> [1]
> https://www.researchgate.net/publication/235663569_Performance_of_the_most_common_non-cryptographic_hash_functions
> Benchmark evidence (the smaller the better as this is average time):
>  !benchmark-evidence.png! 
> The benchmark can be reproduced by running {{./jmh.sh LogCleanerBenchmark}} 
> from 
> https://github.com/viktorsomogyi/kafka/tree/KAFKA-3987-hash-algorithm-murmur 
> in the {{jmh-benchmark}} folder.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to