> you see that with no compression 80% of the time goes to FileChannel.write,
> But with snappy enabled only 5% goes to writing data, 50% of the time goes > to byte copying and allocation, and only about 22% goes to actual I had similar problem with MapDB, it was solved by using memory mapped files. Not sure how it applies to this case. Regards, Jan Kotek On Friday 02 August 2013 22:19:34 Jay Kreps wrote: > Chris commented in another thread about the poor compression performance in > 0.8, even with snappy. > > Indeed if I run the linear log write throughput test on my laptop I see > 75MB/sec with no compression and 17MB/sec with snappy. > > This is a little surprising as snappy claims 200MB round-trip performance > (compress + uncompress) from java. So what is going on? > > Well you may remember I actually filed a bug a while back on all the > inefficient byte copying in the compression path (KAFKA-527). I didn't > think too much of it, other than it is a bit sloppy, since after all > computers are good at copying bytes, right? > > Turns out not so much, if you look at a profile of the standalone log test > you see that with no compression 80% of the time goes to FileChannel.write, > which is reasonable since that is what a log does. > > But with snappy enabled only 5% goes to writing data, 50% of the time goes > to byte copying and allocation, and only about 22% goes to actual > compression and decompression (with lots of misc stuff in their I haven't > bothered to tally). > > If someone was to optimize this code path I think we could take a patch in > 0.8.1. It shouldn't be too hard, just using the backing array on the byte > buffer and avoiding all the input streams, output streams, byte array > output streams, and intermediate message blobs. > > I summarized this along with how to reproduce the test results here: > https://issues.apache.org/jira/browse/KAFKA-527 > > -Jay