Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Ben Stopford Wed, 25 Jan 2017 07:23:35 -0800

Is there more discussion to be had on this KIP, or should it be taken to a
vote?


On Mon, Jan 16, 2017 at 6:37 AM Dongjin Lee <[email protected]> wrote:

> I updated KIP-110 with JMH-measured benchmark results. Please have a review
> when you are free. (The overall result is not different yet.)
>
> Regards,
> Dongjin
>
> +1. Could anyone assign KAFKA-4514 to me?
>
> On Thu, Jan 12, 2017 at 11:39 AM, Dongjin Lee <[email protected]> wrote:
>
> > Okay, I will have a try.
> > Thanks Ewen for the guidance!!
> >
> > Best,
> > Dongjin
> >
> > On Thu, Jan 12, 2017 at 6:44 AM, Ismael Juma <[email protected]> wrote:
> >
> >> That's a good point Ewen. Dongjin, you could use the branch that Ewen
> >> linked for the performance testing. It would also help validate the PR.
> >>
> >> Ismael
> >>
> >> On Wed, Jan 11, 2017 at 9:38 PM, Ewen Cheslack-Postava <
> [email protected]
> >> >
> >> wrote:
> >>
> >> > FYI, there's an outstanding patch for getting some JMH benchmarking
> >> setup:
> >> > https://github.com/apache/kafka/pull/1712 I haven't found time to
> >> review
> >> > it
> >> > (and don't really know JMH well anyway) but it might be worth getting
> >> that
> >> > landed so we can use it for this as well.
> >> >
> >> > -Ewen
> >> >
> >> > On Wed, Jan 11, 2017 at 6:35 AM, Dongjin Lee <[email protected]>
> >> wrote:
> >> >
> >> > > Hi Ismael,
> >> > >
> >> > > 1. In the case of compression output, yes, lz4 is producing the
> >> smaller
> >> > > output than gzip. In fact, my benchmark was inspired
> >> > > by MessageCompressionTest#testCompressSize unit test and the result
> >> is
> >> > > same - 396 bytes for gzip and 387 bytes for lz4.
> >> > > 2. I agree that my (former) approach can result in unreliable
> output.
> >> > > However, I am experiencing difficulties on how to acquire the
> >> benchmark
> >> > > metrics from Kafka. For you recommended JMH, I just started to
> google
> >> for
> >> > > it. If possible, could you give any example on how to use JMH
> against
> >> > > Kafka? If it is the case, it will be a great help.
> >> > > Regards,Dongjin
> >> > >
> >> > >                 _____________________________
> >> > > From: Ismael Juma <[email protected]>
> >> > > Sent: Wednesday, January 11, 2017 7:33 PM
> >> > > Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression
> >> > > To:  <[email protected]>
> >> > >
> >> > >
> >> > > Thanks Dongjin. I highly recommend using JMH for the benchmark, the
> >> > > existing one has a few problems that could result in unreliable
> >> results.
> >> > > Also, it's a bit surprising that LZ4 is producing smaller output
> than
> >> > gzip.
> >> > > Is that right?
> >> > >
> >> > > Ismael
> >> > >
> >> > > On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee <[email protected]>
> >> > wrote:
> >> > >
> >> > > > Ismael,
> >> > > >
> >> > > > I pushed the benchmark code I used, with some updates (iteration:
> >> 20 ->
> >> > > > 1000). I also updated the KIP page with the updated benchmark
> >> results.
> >> > > > Please take a review when you are free. The attached screenshot
> >> shows
> >> > how
> >> > > > to run the benchmarker.
> >> > > >
> >> > > > Thanks,
> >> > > > Dongjin
> >> > > >
> >> > > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee <[email protected]>
> >> > wrote:
> >> > > >
> >> > > >> Ismael,
> >> > > >>
> >> > > >> I see. Then, I will share the benchmark code I used by tomorrow.
> >> > Thanks
> >> > > >> for your guidance.
> >> > > >>
> >> > > >> Best,
> >> > > >> Dongjin
> >> > > >>
> >> > > >> -----
> >> > > >>
> >> > > >> Dongjin Lee
> >> > > >>
> >> > > >> Software developer in Line+.
> >> > > >> So interested in massive-scale machine learning.
> >> > > >>
> >> > > >> facebook: www.facebook.com/dongjin.lee.kr
> >> > > >> linkedin: kr.linkedin.com/in/dongjinleekr
> >> > > >> github: github.com/dongjinleekr
> >> > > >> twitter: www.twitter.com/dongjinleekr
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" <
> >> > [email protected]
> >> > > >
> >> > > >> wrote:
> >> > > >>
> >> > > >> Dongjin,
> >> > > >>>
> >> > > >>> The KIP states:
> >> > > >>>
> >> > > >>> "I compared the compressed size and compression time of 3
> >> 1kb-sized
> >> > > >>> messages (3102 bytes in total), with the Draft-implementation of
> >> > > ZStandard
> >> > > >>> Compression Codec and all currently available CompressionCodecs.
> >> All
> >> > > >>> elapsed times are the average of 20 trials."
> >> > > >>>
> >> > > >>> But doesn't give any details of how this was implemented. Is the
> >> > source
> >> > > >>> code available somewhere? Micro-benchmarking in the JVM is
> pretty
> >> > > tricky so
> >> > > >>> it needs verification before numbers can be trusted. A
> performance
> >> > test
> >> > > >>> with kafka-producer-perf-test.sh would be nice to have as well,
> if
> >> > > possible.
> >> > > >>>
> >> > > >>> Thanks,
> >> > > >>> Ismael
> >> > > >>>
> >> > > >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee  wrote:
> >> > > >>>
> >> > > >>> > Ismael,
> >> > > >>> >
> >> > > >>> > 1. Is the benchmark in the KIP page not enough? You mean we
> >> need a
> >> > > whole
> >> > > >>> > performance test using kafka-producer-perf-test.sh?
> >> > > >>> >
> >> > > >>> > 2. It seems like no major project is relying on it currently.
> >> > > However,
> >> > > >>> > after reviewing the code, I concluded that at least this
> project
> >> > has
> >> > > a good
> >> > > >>> > test coverage. And for the problem of upstream tracking -
> >> although
> >> > > there is
> >> > > >>> > no significant update on ZStandard to judge this problem, it
> >> seems
> >> > > not bad.
> >> > > >>> > If required, I can take responsibility of the tracking for
> this
> >> > > library.
> >> > > >>> >
> >> > > >>> > Thanks,
> >> > > >>> > Dongjin
> >> > > >>> >
> >> > > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma  wrote:
> >> > > >>> >
> >> > > >>> > > Thanks for posting the KIP, ZStandard looks like a nice
> >> > > improvement over
> >> > > >>> > > the existing compression algorithms. A couple of questions:
> >> > > >>> > >
> >> > > >>> > > 1. Can you please elaborate on the details of the benchmark?
> >> > > >>> > > 2. About https://github.com/luben/zstd-jni, can we rely on
> >> it? A
> >> > > few
> >> > > >>> > > things
> >> > > >>> > > to consider: are there other projects using it, does it have
> >> good
> >> > > test
> >> > > >>> > > coverage, are there performance tests, does it track
> upstream
> >> > > closely?
> >> > > >>> > >
> >> > > >>> > > Thanks,
> >> > > >>> > > Ismael
> >> > > >>> > >
> >> > > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee  wrote:
> >> > > >>> > >
> >> > > >>> > > > Hi all,
> >> > > >>> > > >
> >> > > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for
> ZStandard
> >> > > >>> > Compression"
> >> > > >>> > > > for
> >> > > >>> > > > discussion:
> >> > > >>> > > >
> >> > > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression
> >> > > >>> > > >
> >> > > >>> > > > Please have a look when you are free.
> >> > > >>> > > >
> >> > > >>> > > > Best,
> >> > > >>> > > > Dongjin
> >> > > >>> > > >
> >> > > >>> > > > --
> >> > > >>> > > > *Dongjin Lee*
> >> > > >>> > > >
> >> > > >>> > > >
> >> > > >>> > > > *Software developer in Line+.So interested in
> massive-scale
> >> > > machine
> >> > > >>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > >>> > > > linkedin:
> >> > > >>> > > > kr.linkedin.com/in/dongjinleekr
> >> > > >>> > > > github:
> >> > > >>> > > > github.com/dongjinleekr
> >> > > >>> > > > twitter: www.twitter.com/dongjinleekr
> >> > > >>> > > > *
> >> > > >>> > > >
> >> > > >>> > >
> >> > > >>> >
> >> > > >>> >
> >> > > >>> >
> >> > > >>> > --
> >> > > >>> > *Dongjin Lee*
> >> > > >>> >
> >> > > >>> >
> >> > > >>> > *Software developer in Line+.So interested in massive-scale
> >> machine
> >> > > >>> > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > >>> > linkedin:
> >> > > >>> > kr.linkedin.com/in/dongjinleekr
> >> > > >>> > github:
> >> > > >>> > github.com/dongjinleekr
> >> > > >>> > twitter: www.twitter.com/dongjinleekr
> >> > > >>> > *
> >> > > >>> >
> >> > > >>>
> >> > > >>>
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Dongjin Lee*
> >> > > >
> >> > > >
> >> > > > *Software developer in Line+.So interested in massive-scale
> machine
> >> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> >> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> >> kr.linkedin.com/in/
> >> > > dongjinleekr
> >> > > > <http://kr.linkedin.com/in/dongjinleekr>github:
> >> > > > <http://goog_969573159/>github.com/dongjinleekr
> >> > > > <http://github.com/dongjinleekr>twitter:
> >> www.twitter.com/dongjinleekr
> >> > > > <http://www.twitter.com/dongjinleekr>*
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > *Dongjin Lee*
> >
> >
> > *Software developer in Line+.So interested in massive-scale machine
> > learning.facebook: www.facebook.com/dongjin.lee.kr
> > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>github:
> > <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
> > <http://www.twitter.com/dongjinleekr>*
> >
>
>
>
> --
> *Dongjin Lee*
>
>
> *Software developer in Line+.So interested in massive-scale machine
> learning.facebook: www.facebook.com/dongjin.lee.kr
> <http://www.facebook.com/dongjin.lee.kr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>github:
> <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
> <http://www.twitter.com/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression

Reply via email to