Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Dongjin Lee Wed, 11 Jan 2017 18:40:16 -0800

Okay, I will have a try.
Thanks Ewen for the guidance!!

Best,
Dongjin


On Thu, Jan 12, 2017 at 6:44 AM, Ismael Juma <[email protected]> wrote:

> That's a good point Ewen. Dongjin, you could use the branch that Ewen
> linked for the performance testing. It would also help validate the PR.
>
> Ismael
>
> On Wed, Jan 11, 2017 at 9:38 PM, Ewen Cheslack-Postava <[email protected]>
> wrote:
>
> > FYI, there's an outstanding patch for getting some JMH benchmarking
> setup:
> > https://github.com/apache/kafka/pull/1712 I haven't found time to review
> > it
> > (and don't really know JMH well anyway) but it might be worth getting
> that
> > landed so we can use it for this as well.
> >
> > -Ewen
> >
> > On Wed, Jan 11, 2017 at 6:35 AM, Dongjin Lee <[email protected]> wrote:
> >
> > > Hi Ismael,
> > >
> > > 1. In the case of compression output, yes, lz4 is producing the smaller
> > > output than gzip. In fact, my benchmark was inspired
> > > by MessageCompressionTest#testCompressSize unit test and the result is
> > > same - 396 bytes for gzip and 387 bytes for lz4.
> > > 2. I agree that my (former) approach can result in unreliable output.
> > > However, I am experiencing difficulties on how to acquire the benchmark
> > > metrics from Kafka. For you recommended JMH, I just started to google
> for
> > > it. If possible, could you give any example on how to use JMH against
> > > Kafka? If it is the case, it will be a great help.
> > > Regards,Dongjin
> > >
> > >                 _____________________________
> > > From: Ismael Juma <[email protected]>
> > > Sent: Wednesday, January 11, 2017 7:33 PM
> > > Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression
> > > To:  <[email protected]>
> > >
> > >
> > > Thanks Dongjin. I highly recommend using JMH for the benchmark, the
> > > existing one has a few problems that could result in unreliable
> results.
> > > Also, it's a bit surprising that LZ4 is producing smaller output than
> > gzip.
> > > Is that right?
> > >
> > > Ismael
> > >
> > > On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee <[email protected]>
> > wrote:
> > >
> > > > Ismael,
> > > >
> > > > I pushed the benchmark code I used, with some updates (iteration: 20
> ->
> > > > 1000). I also updated the KIP page with the updated benchmark
> results.
> > > > Please take a review when you are free. The attached screenshot shows
> > how
> > > > to run the benchmarker.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee <[email protected]>
> > wrote:
> > > >
> > > >> Ismael,
> > > >>
> > > >> I see. Then, I will share the benchmark code I used by tomorrow.
> > Thanks
> > > >> for your guidance.
> > > >>
> > > >> Best,
> > > >> Dongjin
> > > >>
> > > >> -----
> > > >>
> > > >> Dongjin Lee
> > > >>
> > > >> Software developer in Line+.
> > > >> So interested in massive-scale machine learning.
> > > >>
> > > >> facebook: www.facebook.com/dongjin.lee.kr
> > > >> linkedin: kr.linkedin.com/in/dongjinleekr
> > > >> github: github.com/dongjinleekr
> > > >> twitter: www.twitter.com/dongjinleekr
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" <
> > [email protected]
> > > >
> > > >> wrote:
> > > >>
> > > >> Dongjin,
> > > >>>
> > > >>> The KIP states:
> > > >>>
> > > >>> "I compared the compressed size and compression time of 3 1kb-sized
> > > >>> messages (3102 bytes in total), with the Draft-implementation of
> > > ZStandard
> > > >>> Compression Codec and all currently available CompressionCodecs.
> All
> > > >>> elapsed times are the average of 20 trials."
> > > >>>
> > > >>> But doesn't give any details of how this was implemented. Is the
> > source
> > > >>> code available somewhere? Micro-benchmarking in the JVM is pretty
> > > tricky so
> > > >>> it needs verification before numbers can be trusted. A performance
> > test
> > > >>> with kafka-producer-perf-test.sh would be nice to have as well, if
> > > possible.
> > > >>>
> > > >>> Thanks,
> > > >>> Ismael
> > > >>>
> > > >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee  wrote:
> > > >>>
> > > >>> > Ismael,
> > > >>> >
> > > >>> > 1. Is the benchmark in the KIP page not enough? You mean we need
> a
> > > whole
> > > >>> > performance test using kafka-producer-perf-test.sh?
> > > >>> >
> > > >>> > 2. It seems like no major project is relying on it currently.
> > > However,
> > > >>> > after reviewing the code, I concluded that at least this project
> > has
> > > a good
> > > >>> > test coverage. And for the problem of upstream tracking -
> although
> > > there is
> > > >>> > no significant update on ZStandard to judge this problem, it
> seems
> > > not bad.
> > > >>> > If required, I can take responsibility of the tracking for this
> > > library.
> > > >>> >
> > > >>> > Thanks,
> > > >>> > Dongjin
> > > >>> >
> > > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma  wrote:
> > > >>> >
> > > >>> > > Thanks for posting the KIP, ZStandard looks like a nice
> > > improvement over
> > > >>> > > the existing compression algorithms. A couple of questions:
> > > >>> > >
> > > >>> > > 1. Can you please elaborate on the details of the benchmark?
> > > >>> > > 2. About https://github.com/luben/zstd-jni, can we rely on
> it? A
> > > few
> > > >>> > > things
> > > >>> > > to consider: are there other projects using it, does it have
> good
> > > test
> > > >>> > > coverage, are there performance tests, does it track upstream
> > > closely?
> > > >>> > >
> > > >>> > > Thanks,
> > > >>> > > Ismael
> > > >>> > >
> > > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee  wrote:
> > > >>> > >
> > > >>> > > > Hi all,
> > > >>> > > >
> > > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for ZStandard
> > > >>> > Compression"
> > > >>> > > > for
> > > >>> > > > discussion:
> > > >>> > > >
> > > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > >>> > > >
> > > >>> > > > Please have a look when you are free.
> > > >>> > > >
> > > >>> > > > Best,
> > > >>> > > > Dongjin
> > > >>> > > >
> > > >>> > > > --
> > > >>> > > > *Dongjin Lee*
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > *Software developer in Line+.So interested in massive-scale
> > > machine
> > > >>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > >>> > > > linkedin:
> > > >>> > > > kr.linkedin.com/in/dongjinleekr
> > > >>> > > > github:
> > > >>> > > > github.com/dongjinleekr
> > > >>> > > > twitter: www.twitter.com/dongjinleekr
> > > >>> > > > *
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> > *Dongjin Lee*
> > > >>> >
> > > >>> >
> > > >>> > *Software developer in Line+.So interested in massive-scale
> machine
> > > >>> > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > >>> > linkedin:
> > > >>> > kr.linkedin.com/in/dongjinleekr
> > > >>> > github:
> > > >>> > github.com/dongjinleekr
> > > >>> > twitter: www.twitter.com/dongjinleekr
> > > >>> > *
> > > >>> >
> > > >>>
> > > >>>
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > >
> > > > *Software developer in Line+.So interested in massive-scale machine
> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> kr.linkedin.com/in/
> > > dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>github:
> > > > <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>twitter:
> www.twitter.com/dongjinleekr
> > > > <http://www.twitter.com/dongjinleekr>*
> > > >
> > >
> > >
> > >
> > >
> > >
> >
>



-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Reply via email to