[ https://issues.apache.org/jira/browse/KAFKA-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127131#comment-15127131 ]
Jiangjie Qin commented on KAFKA-3174: ------------------------------------- [~ijuma] I modified the test a bit to generate different random bytes for every iteration. This should avoid the impact of CPU in-chip cache. {code} KCrc32: Size = 8 , avg time = 54 ns JCrc32: Size = 8 , avg time = 65 ns KCrc32: Size = 16 , avg time = 58 ns JCrc32: Size = 16 , avg time = 78 ns KCrc32: Size = 32 , avg time = 73 ns JCrc32: Size = 32 , avg time = 95 ns KCrc32: Size = 64 , avg time = 95 ns JCrc32: Size = 64 , avg time = 113 ns KCrc32: Size = 128 , avg time = 145 ns JCrc32: Size = 128 , avg time = 142 ns KCrc32: Size = 1024 , avg time = 850 ns JCrc32: Size = 1024 , avg time = 532 ns KCrc32: Size = 16384 , avg time = 12811 ns JCrc32: Size = 16384 , avg time = 7200 ns KCrc32: Size = 65536 , avg time = 51625 ns JCrc32: Size = 65536 , avg time = 28756 ns KCrc32: Size = 1048576 , avg time = 821661 ns JCrc32: Size = 1048576 , avg time = 461750 ns {code} I think the results is same as yours. It makes sense because CRC32 in SSE4.2 is a SIMD instruction. So it only helps when data parallelism gains more performance than the results merging cost. http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411 So it seems to me that: 1. If compressed message is used, we probably would like to go with java CRC32, because eventually we need to compute the compressed message CRC which will very likely offset the previous performance loss (if there is any) on each inner messages. 2. If uncompressed message is used, then it depends on average message size. We can theoretically dynamically decide which class to use when we know the message size. But I am not sure if that would be a little over concerning, because compared with serialization/compression cost, CRC computation seems almost ignorable. > Re-evaluate the CRC32 class performance. > ---------------------------------------- > > Key: KAFKA-3174 > URL: https://issues.apache.org/jira/browse/KAFKA-3174 > Project: Kafka > Issue Type: Improvement > Affects Versions: 0.9.0.0 > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > Fix For: 0.9.0.1 > > > We used org.apache.kafka.common.utils.CRC32 in clients because it has better > performance than java.util.zip.CRC32 in Java 1.6. > In a recent test I ran it looks in Java 1.8 the CRC32 class is 2x as fast as > the Crc32 class we are using now. We may want to re-evaluate the performance > of Crc32 class and see it makes sense to simply use java CRC32 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)