Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Ismael Juma Wed, 11 Jan 2017 13:46:10 -0800

That's a good point Ewen. Dongjin, you could use the branch that Ewen
linked for the performance testing. It would also help validate the PR.


Ismael

On Wed, Jan 11, 2017 at 9:38 PM, Ewen Cheslack-Postava <[email protected]>
wrote:

> FYI, there's an outstanding patch for getting some JMH benchmarking setup:
> https://github.com/apache/kafka/pull/1712 I haven't found time to review
> it
> (and don't really know JMH well anyway) but it might be worth getting that
> landed so we can use it for this as well.
>
> -Ewen
>
> On Wed, Jan 11, 2017 at 6:35 AM, Dongjin Lee <[email protected]> wrote:
>
> > Hi Ismael,
> >
> > 1. In the case of compression output, yes, lz4 is producing the smaller
> > output than gzip. In fact, my benchmark was inspired
> > by MessageCompressionTest#testCompressSize unit test and the result is
> > same - 396 bytes for gzip and 387 bytes for lz4.
> > 2. I agree that my (former) approach can result in unreliable output.
> > However, I am experiencing difficulties on how to acquire the benchmark
> > metrics from Kafka. For you recommended JMH, I just started to google for
> > it. If possible, could you give any example on how to use JMH against
> > Kafka? If it is the case, it will be a great help.
> > Regards,Dongjin
> >
> >                 _____________________________
> > From: Ismael Juma <[email protected]>
> > Sent: Wednesday, January 11, 2017 7:33 PM
> > Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression
> > To:  <[email protected]>
> >
> >
> > Thanks Dongjin. I highly recommend using JMH for the benchmark, the
> > existing one has a few problems that could result in unreliable results.
> > Also, it's a bit surprising that LZ4 is producing smaller output than
> gzip.
> > Is that right?
> >
> > Ismael
> >
> > On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee <[email protected]>
> wrote:
> >
> > > Ismael,
> > >
> > > I pushed the benchmark code I used, with some updates (iteration: 20 ->
> > > 1000). I also updated the KIP page with the updated benchmark results.
> > > Please take a review when you are free. The attached screenshot shows
> how
> > > to run the benchmarker.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee <[email protected]>
> wrote:
> > >
> > >> Ismael,
> > >>
> > >> I see. Then, I will share the benchmark code I used by tomorrow.
> Thanks
> > >> for your guidance.
> > >>
> > >> Best,
> > >> Dongjin
> > >>
> > >> -----
> > >>
> > >> Dongjin Lee
> > >>
> > >> Software developer in Line+.
> > >> So interested in massive-scale machine learning.
> > >>
> > >> facebook: www.facebook.com/dongjin.lee.kr
> > >> linkedin: kr.linkedin.com/in/dongjinleekr
> > >> github: github.com/dongjinleekr
> > >> twitter: www.twitter.com/dongjinleekr
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >> Dongjin,
> > >>>
> > >>> The KIP states:
> > >>>
> > >>> "I compared the compressed size and compression time of 3 1kb-sized
> > >>> messages (3102 bytes in total), with the Draft-implementation of
> > ZStandard
> > >>> Compression Codec and all currently available CompressionCodecs. All
> > >>> elapsed times are the average of 20 trials."
> > >>>
> > >>> But doesn't give any details of how this was implemented. Is the
> source
> > >>> code available somewhere? Micro-benchmarking in the JVM is pretty
> > tricky so
> > >>> it needs verification before numbers can be trusted. A performance
> test
> > >>> with kafka-producer-perf-test.sh would be nice to have as well, if
> > possible.
> > >>>
> > >>> Thanks,
> > >>> Ismael
> > >>>
> > >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee  wrote:
> > >>>
> > >>> > Ismael,
> > >>> >
> > >>> > 1. Is the benchmark in the KIP page not enough? You mean we need a
> > whole
> > >>> > performance test using kafka-producer-perf-test.sh?
> > >>> >
> > >>> > 2. It seems like no major project is relying on it currently.
> > However,
> > >>> > after reviewing the code, I concluded that at least this project
> has
> > a good
> > >>> > test coverage. And for the problem of upstream tracking - although
> > there is
> > >>> > no significant update on ZStandard to judge this problem, it seems
> > not bad.
> > >>> > If required, I can take responsibility of the tracking for this
> > library.
> > >>> >
> > >>> > Thanks,
> > >>> > Dongjin
> > >>> >
> > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma  wrote:
> > >>> >
> > >>> > > Thanks for posting the KIP, ZStandard looks like a nice
> > improvement over
> > >>> > > the existing compression algorithms. A couple of questions:
> > >>> > >
> > >>> > > 1. Can you please elaborate on the details of the benchmark?
> > >>> > > 2. About https://github.com/luben/zstd-jni, can we rely on it? A
> > few
> > >>> > > things
> > >>> > > to consider: are there other projects using it, does it have good
> > test
> > >>> > > coverage, are there performance tests, does it track upstream
> > closely?
> > >>> > >
> > >>> > > Thanks,
> > >>> > > Ismael
> > >>> > >
> > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee  wrote:
> > >>> > >
> > >>> > > > Hi all,
> > >>> > > >
> > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for ZStandard
> > >>> > Compression"
> > >>> > > > for
> > >>> > > > discussion:
> > >>> > > >
> > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > >>> > > >
> > >>> > > > Please have a look when you are free.
> > >>> > > >
> > >>> > > > Best,
> > >>> > > > Dongjin
> > >>> > > >
> > >>> > > > --
> > >>> > > > *Dongjin Lee*
> > >>> > > >
> > >>> > > >
> > >>> > > > *Software developer in Line+.So interested in massive-scale
> > machine
> > >>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > >>> > > > linkedin:
> > >>> > > > kr.linkedin.com/in/dongjinleekr
> > >>> > > > github:
> > >>> > > > github.com/dongjinleekr
> > >>> > > > twitter: www.twitter.com/dongjinleekr
> > >>> > > > *
> > >>> > > >
> > >>> > >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > *Dongjin Lee*
> > >>> >
> > >>> >
> > >>> > *Software developer in Line+.So interested in massive-scale machine
> > >>> > learning.facebook: www.facebook.com/dongjin.lee.kr
> > >>> > linkedin:
> > >>> > kr.linkedin.com/in/dongjinleekr
> > >>> > github:
> > >>> > github.com/dongjinleekr
> > >>> > twitter: www.twitter.com/dongjinleekr
> > >>> > *
> > >>> >
> > >>>
> > >>>
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > >
> > > *Software developer in Line+.So interested in massive-scale machine
> > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > <http://www.facebook.com/dongjin.lee.kr>linkedin: kr.linkedin.com/in/
> > dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>github:
> > > <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
> > > <http://www.twitter.com/dongjinleekr>*
> > >
> >
> >
> >
> >
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Reply via email to