Hi Ismael, 1. In the case of compression output, yes, lz4 is producing the smaller output than gzip. In fact, my benchmark was inspired by MessageCompressionTest#testCompressSize unit test and the result is same - 396 bytes for gzip and 387 bytes for lz4. 2. I agree that my (former) approach can result in unreliable output. However, I am experiencing difficulties on how to acquire the benchmark metrics from Kafka. For you recommended JMH, I just started to google for it. If possible, could you give any example on how to use JMH against Kafka? If it is the case, it will be a great help. Regards,Dongjin
_____________________________ From: Ismael Juma <ism...@juma.me.uk> Sent: Wednesday, January 11, 2017 7:33 PM Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression To: <dev@kafka.apache.org> Thanks Dongjin. I highly recommend using JMH for the benchmark, the existing one has a few problems that could result in unreliable results. Also, it's a bit surprising that LZ4 is producing smaller output than gzip. Is that right? Ismael On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee <dong...@apache.org> wrote: > Ismael, > > I pushed the benchmark code I used, with some updates (iteration: 20 -> > 1000). I also updated the KIP page with the updated benchmark results. > Please take a review when you are free. The attached screenshot shows how > to run the benchmarker. > > Thanks, > Dongjin > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee <dong...@apache.org> wrote: > >> Ismael, >> >> I see. Then, I will share the benchmark code I used by tomorrow. Thanks >> for your guidance. >> >> Best, >> Dongjin >> >> ----- >> >> Dongjin Lee >> >> Software developer in Line+. >> So interested in massive-scale machine learning. >> >> facebook: www.facebook.com/dongjin.lee.kr >> linkedin: kr.linkedin.com/in/dongjinleekr >> github: github.com/dongjinleekr >> twitter: www.twitter.com/dongjinleekr >> >> >> >> >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" <ism...@juma.me.uk> >> wrote: >> >> Dongjin, >>> >>> The KIP states: >>> >>> "I compared the compressed size and compression time of 3 1kb-sized >>> messages (3102 bytes in total), with the Draft-implementation of ZStandard >>> Compression Codec and all currently available CompressionCodecs. All >>> elapsed times are the average of 20 trials." >>> >>> But doesn't give any details of how this was implemented. Is the source >>> code available somewhere? Micro-benchmarking in the JVM is pretty tricky so >>> it needs verification before numbers can be trusted. A performance test >>> with kafka-producer-perf-test.sh would be nice to have as well, if possible. >>> >>> Thanks, >>> Ismael >>> >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee wrote: >>> >>> > Ismael, >>> > >>> > 1. Is the benchmark in the KIP page not enough? You mean we need a whole >>> > performance test using kafka-producer-perf-test.sh? >>> > >>> > 2. It seems like no major project is relying on it currently. However, >>> > after reviewing the code, I concluded that at least this project has a >>> > good >>> > test coverage. And for the problem of upstream tracking - although there >>> > is >>> > no significant update on ZStandard to judge this problem, it seems not >>> > bad. >>> > If required, I can take responsibility of the tracking for this library. >>> > >>> > Thanks, >>> > Dongjin >>> > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma wrote: >>> > >>> > > Thanks for posting the KIP, ZStandard looks like a nice improvement over >>> > > the existing compression algorithms. A couple of questions: >>> > > >>> > > 1. Can you please elaborate on the details of the benchmark? >>> > > 2. About https://github.com/luben/zstd-jni, can we rely on it? A few >>> > > things >>> > > to consider: are there other projects using it, does it have good test >>> > > coverage, are there performance tests, does it track upstream closely? >>> > > >>> > > Thanks, >>> > > Ismael >>> > > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee wrote: >>> > > >>> > > > Hi all, >>> > > > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for ZStandard >>> > Compression" >>> > > > for >>> > > > discussion: >>> > > > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression >>> > > > >>> > > > Please have a look when you are free. >>> > > > >>> > > > Best, >>> > > > Dongjin >>> > > > >>> > > > -- >>> > > > *Dongjin Lee* >>> > > > >>> > > > >>> > > > *Software developer in Line+.So interested in massive-scale machine >>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr >>> > > > linkedin: >>> > > > kr.linkedin.com/in/dongjinleekr >>> > > > github: >>> > > > github.com/dongjinleekr >>> > > > twitter: www.twitter.com/dongjinleekr >>> > > > * >>> > > > >>> > > >>> > >>> > >>> > >>> > -- >>> > *Dongjin Lee* >>> > >>> > >>> > *Software developer in Line+.So interested in massive-scale machine >>> > learning.facebook: www.facebook.com/dongjin.lee.kr >>> > linkedin: >>> > kr.linkedin.com/in/dongjinleekr >>> > github: >>> > github.com/dongjinleekr >>> > twitter: www.twitter.com/dongjinleekr >>> > * >>> > >>> >>> > > > -- > *Dongjin Lee* > > > *Software developer in Line+.So interested in massive-scale machine > learning.facebook: www.facebook.com/dongjin.lee.kr > <http://www.facebook.com/dongjin.lee.kr>linkedin: > kr.linkedin.com/in/dongjinleekr > <http://kr.linkedin.com/in/dongjinleekr>github: > <http://goog_969573159/>github.com/dongjinleekr > <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr > <http://www.twitter.com/dongjinleekr>* >