We need to first decide on the right behavior before optimizing on the implementation.
Few key goals that I would put forward are - 1. Decoupling compression codec of the producer and the log 2. Ensuring message validity by the server on receiving bytes. This is done by the iterator today and this is important to ensure bad data does not creep in 3. Simple consumer implementation 4. Implementation that has good performance and efficiency With the above points in mind, I suggest we switch to Snappy as the default compression, optimize the code on the server end to avoid unnecessary copies and remove producer side compression completely except for cross DC sends. On 8/15/13 11:28 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >Here is a comment from Guozhong on this issue. He posted it on the >compression byte-copying issue, but it is really about not needing to do >compression. His suggestion is interesting though it ends up pushing more >complexity into consumers. > >Guozhang Wang commented on KAFKA-527: >------------------------------------- > >One alternative approach would be like this: > >Currently in the compression code (ByteBufferMessageSet.create), for each >message we write 1) the incrementing logical offset in LONG, 2) the >message >byte size in INT, and 3) the message payload. > >The idea is that since the logical offset is just incrementing, hence with >a compressed message, as long as we know the offset of the first message, >we would know the offset of the rest messages without even reading the >offset field. > >So we can ignore reading the offset of each message inside of the >compressed message but only the offset of the wrapper message which is the >offset of the last message + 1, and then in assignOffsets just modify the >offset of the wrapper message. Another change would be at the consumer >side, the iterator would need to be smart of interpreting the offsets of >messages while deep-iterating the compressed message. > >As Jay pointed out, this method would not work with log compaction since >it >would break the assumption that offsets increments continuously. Two >workarounds of this issue: > >1) In log compaction, instead of deleting the to-be-deleted-message just >setting its payload to null but keep its header and hence keeping its slot >in the incrementing offset. >2) During the compression process, instead of writing the absolute value >of >the logical offset of messages, write the deltas of their offset compared >with the offset of the wrapper message. So -1 would mean continuously >decrementing from the wrapper message offset, and -2/3/... would be >skipping holes in side the compressed message. > > >On Fri, Aug 2, 2013 at 10:19 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> Chris commented in another thread about the poor compression performance >> in 0.8, even with snappy. >> >> Indeed if I run the linear log write throughput test on my laptop I see >> 75MB/sec with no compression and 17MB/sec with snappy. >> >> This is a little surprising as snappy claims 200MB round-trip >>performance >> (compress + uncompress) from java. So what is going on? >> >> Well you may remember I actually filed a bug a while back on all the >> inefficient byte copying in the compression path (KAFKA-527). I didn't >> think too much of it, other than it is a bit sloppy, since after all >> computers are good at copying bytes, right? >> >> Turns out not so much, if you look at a profile of the standalone log >>test >> you see that with no compression 80% of the time goes to >>FileChannel.write, >> which is reasonable since that is what a log does. >> >> But with snappy enabled only 5% goes to writing data, 50% of the time >>goes >> to byte copying and allocation, and only about 22% goes to actual >> compression and decompression (with lots of misc stuff in their I >>haven't >> bothered to tally). >> >> If someone was to optimize this code path I think we could take a patch >>in >> 0.8.1. It shouldn't be too hard, just using the backing array on the >>byte >> buffer and avoiding all the input streams, output streams, byte array >> output streams, and intermediate message blobs. >> >> I summarized this along with how to reproduce the test results here: >> https://issues.apache.org/jira/browse/KAFKA-527 >> >> -Jay >>