[ https://issues.apache.org/jira/browse/KAFKA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532413#comment-13532413 ]
David Arthur commented on KAFKA-374: ------------------------------------ Akka seems a bit overkill for this (although it does have some nice properties). It would be interesting to refactor the threading in Kafka with Akka and see what kind of performance differences there are (certainly beyond the scope of this JIRA). As for the CRC implementation, is there consensus of what do here - Java or Scala? I say +1 for Java since no one will need to modify this code and it doesn't really matter that it's not Scala. > Move to java CRC32 implementation > --------------------------------- > > Key: KAFKA-374 > URL: https://issues.apache.org/jira/browse/KAFKA-374 > Project: Kafka > Issue Type: New Feature > Components: core > Affects Versions: 0.8 > Reporter: Jay Kreps > Priority: Minor > Labels: newbie > Attachments: KAFKA-374-draft.patch, KAFKA-374.patch > > > We keep a per-record crc32. This is fairly cheap algorithm, but the java > implementation uses JNI and it seems to be a bit expensive for small records. > I have seen this before in Kafka profiles, and I noticed it on another > application I was working on. Basically with small records the native > implementation can only checksum < 100MB/sec. Hadoop has done some analysis > of this and replaced it with a Java implementation that is 2x faster for > large values and 5-10x faster for small values. Details are here HADOOP-6148. > We should do a quick read/write benchmark on log and message set iteration > and see if this improves things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira