[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352227#comment-14352227 ]
Jay Kreps commented on KAFKA-527: --------------------------------- The clients already use MemoryRecords, so 0.8.2 and 0.8.3 will give the speed-up to people uses the clients. I think the question is how best to get the perf improvement to the server which should be largely independent. Guozhang is correct that moving the server to MemoryRecords should be our long term plan and is the end-state we want. However the Message interface is fairly heavily used inside kafka.log so this would be a very large change to those classes. We haven't had a real discussion about how we would go about this and I don't think there is really a timeline. Several options I see: 1. We could do Yasu and Guozhang's fixes now: they are limited in scope, compression is a painpoint now, and we have lots of things in flight right now. 2. We could do a larger conversion of kafka.log to move it off Message/MessageSet/FileMessageSet/ByteBufferMessageSet as Guozhang proposes. This would be a fairly big refactoring, as there are a number of things tied to the MessageSet interface that would all have to move, and there is a significant amount of test code so this would be a big change. However this is certainly where we want to end up. 3. We could decide that we actually prefer java code, and given that the a significant chunk of the common code has to be in Java we should start moving chunks of the server as well. We had talked about this before but I don't think we should start until we have a real plan to finish. But anyhow if we did that we would say instead of just migrating the server from Message/MessageSet/FileMessageSet/ByteBufferMessageSet we would also just wholesale move the log subpackage to java as the first step in a larger migration. The argument both for and against this would be that instead of doing two rewrites, one to change interfaces, and a second to move scala=>java we could just do both at the same time. > Compression support does numerous byte copies > --------------------------------------------- > > Key: KAFKA-527 > URL: https://issues.apache.org/jira/browse/KAFKA-527 > Project: Kafka > Issue Type: Bug > Components: compression > Reporter: Jay Kreps > Assignee: Jay Kreps > Priority: Critical > Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, > java.hprof.no-compression.txt, java.hprof.snappy.text > > > The data path for compressing or decompressing messages is extremely > inefficient. We do something like 7 (?) complete copies of the data, often > for simple things like adding a 4 byte size to the front. I am not sure how > this went by unnoticed. > This is likely the root cause of the performance issues we saw in doing bulk > recompression of data in mirror maker. > The mismatch between the InputStream and OutputStream interfaces and the > Message/MessageSet interfaces which are based on byte buffers is the cause of > many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)