Re: compression performance

Jan Kotek Fri, 16 Aug 2013 02:17:48 -0700

> you see that with no compression 80% of the time goes to FileChannel.write,


> But with snappy enabled only 5% goes to writing data, 50% of the time goes
> to byte copying and allocation, and only about 22% goes to actual

I had similar problem with MapDB, it was solved by using memory mapped files. 
Not sure how it applies to this case.

Regards,
Jan Kotek


On Friday 02 August 2013 22:19:34 Jay Kreps wrote:
> Chris commented in another thread about the poor compression performance in
> 0.8, even with snappy.
> 
> Indeed if I run the linear log write throughput test on my laptop I see
> 75MB/sec with no compression and 17MB/sec with snappy.
> 
> This is a little surprising as snappy claims 200MB round-trip performance
> (compress + uncompress) from java. So what is going on?
> 
> Well you may remember I actually filed a bug a while back on all the
> inefficient byte copying in the compression path (KAFKA-527). I didn't
> think too much of it, other than it is a bit sloppy, since after all
> computers are good at copying bytes, right?
> 
> Turns out not so much, if you look at a profile of the standalone log test
> you see that with no compression 80% of the time goes to FileChannel.write,
> which is reasonable since that is what a log does.
> 
> But with snappy enabled only 5% goes to writing data, 50% of the time goes
> to byte copying and allocation, and only about 22% goes to actual
> compression and decompression (with lots of misc stuff in their I haven't
> bothered to tally).
> 
> If someone was to optimize this code path I think we could take a patch in
> 0.8.1. It shouldn't be too hard, just using the backing array on the byte
> buffer and avoiding all the input streams, output streams, byte array
> output streams, and intermediate message blobs.
> 
> I summarized this along with how to reproduce the test results here:
> https://issues.apache.org/jira/browse/KAFKA-527
> 
> -Jay

Re: compression performance

Reply via email to