[ 
https://issues.apache.org/jira/browse/KAFKA-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127131#comment-15127131
 ] 

Jiangjie Qin commented on KAFKA-3174:
-------------------------------------

[~ijuma] I modified the test a bit to generate different random bytes for every 
iteration. This should avoid the impact of CPU in-chip cache.

{code}
KCrc32: Size = 8        , avg time = 54 ns
JCrc32: Size = 8        , avg time = 65 ns
KCrc32: Size = 16       , avg time = 58 ns
JCrc32: Size = 16       , avg time = 78 ns
KCrc32: Size = 32       , avg time = 73 ns
JCrc32: Size = 32       , avg time = 95 ns
KCrc32: Size = 64       , avg time = 95 ns
JCrc32: Size = 64       , avg time = 113 ns
KCrc32: Size = 128      , avg time = 145 ns
JCrc32: Size = 128      , avg time = 142 ns
KCrc32: Size = 1024     , avg time = 850 ns
JCrc32: Size = 1024     , avg time = 532 ns
KCrc32: Size = 16384    , avg time = 12811 ns
JCrc32: Size = 16384    , avg time = 7200 ns
KCrc32: Size = 65536    , avg time = 51625 ns
JCrc32: Size = 65536    , avg time = 28756 ns
KCrc32: Size = 1048576  , avg time = 821661 ns
JCrc32: Size = 1048576  , avg time = 461750 ns
{code}

I think the results is same as yours. It makes sense because CRC32 in SSE4.2 is 
a SIMD instruction. So it only helps when data parallelism gains more 
performance than the results merging cost.
http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411

So it seems to me that:
1. If compressed message is used, we probably would like to go with java CRC32, 
because eventually we need to compute the compressed message CRC which will 
very likely offset the previous performance loss (if there is any) on each 
inner messages.
2. If uncompressed message is used, then it depends on average message size. We 
can theoretically dynamically decide which class to use when we know the 
message size. But I am not sure if that would be a little over concerning, 
because compared with serialization/compression cost, CRC computation seems 
almost ignorable.




> Re-evaluate the CRC32 class performance.
> ----------------------------------------
>
>                 Key: KAFKA-3174
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3174
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0.0
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>             Fix For: 0.9.0.1
>
>
> We used org.apache.kafka.common.utils.CRC32 in clients because it has better 
> performance than java.util.zip.CRC32 in Java 1.6.
> In a recent test I ran it looks in Java 1.8 the CRC32 class is 2x as fast as 
> the Crc32 class we are using now. We may want to re-evaluate the performance 
> of Crc32 class and see it makes sense to simply use java CRC32 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to