Ismael & All,

After Inspecting the related code & commits, I concluded following:

1. We have to update the masking value which is used to retrieve the used
codec id from the messages, to enable the retrieval of the 3rd bit of
compression type field of the message.
2. The above task is already done; so, we need nothing.

Here is why.

Let's start with the first one, with the scenario Juma proposed. In the
case of receiving the message of unsupported compression type, the receiver
(= broker or consumer) raises IllegalArgumentException[^1][^2]. The key
element in this operation is Record#COMPRESSION_CODEC_MASK, which is used
to extract the codec id. We have to update this value from 2-bits extractor
(=0x03) to 3-bits extractor (=0x07).

But in fact, this task is already done, so its current value is 0x07. We
don't have to update it.

The reason why this task is already done has some story; From the first
time Record.java file was added to the project[^3], the
COMPRESSION_CODEC_MASK was already 2-bits extractor, that is, 0x03. At that
time, Kafka supported only two compression types - GZipCompression and
SnappyCompression.[^4] After that, KAFKA-1456 introduced two additional
codecs of LZ4 and LZ4C[^5]. This update modified COMPRESSION_CODEC_MASK
into 3 bits extractor, 0x07, in the aim of supporting four compression
codecs.

Although its following work, KAFKA-1493, removed the support of LZ4C
codec[^6], this mask was not reverted into 2-bits extractor - by this
reason, we don't need to care about the message format.

Attached screenshot is my test on Juma's scenario. I created topic & sent
some messages using the snapshot version with ZStandard compression and
received the message with the latest version. As you can see, it works
perfectly as expected.

If you have more opinion to this issue, don't hesitate to send me as a
message.

Best,
Dongjin

[^1]: see: Record#compressionType.
[^2]: There is similar routine in Message.scala. But after KAFKA-4390, that
routine is not being used anymore - more precisely, Message class is now
used in ConsoleConsumer only. I think this class should be replaced but
since it is a separated topic, I will send another message for this issue.
[^3]: commit 642da2f (2011.8.2).
[^4]: commit c51b940.
[^5]: commit 547cced.
[^6]: commit 37356bf.

On Thu, Jan 26, 2017 at 12:35 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> So far the discussion was around the performance characteristics of the new
> compression algorithm. Another area that is important and is not covered in
> the KIP is the compatibility implications. For example, what happens if a
> consumer that doesn't support zstd tries to consume a topic compressed with
> it? Or if a broker that doesn't support receives data compressed with it?
> If we go through that exercise, then more changes may be required (like
> bumping the version of produce/fetch protocols).
>
> Ismael
>
> On Wed, Jan 25, 2017 at 3:22 PM, Ben Stopford <b...@confluent.io> wrote:
>
> > Is there more discussion to be had on this KIP, or should it be taken to
> a
> > vote?
> >
> > On Mon, Jan 16, 2017 at 6:37 AM Dongjin Lee <dong...@apache.org> wrote:
> >
> > > I updated KIP-110 with JMH-measured benchmark results. Please have a
> > review
> > > when you are free. (The overall result is not different yet.)
> > >
> > > Regards,
> > > Dongjin
> > >
> > > +1. Could anyone assign KAFKA-4514 to me?
> > >
> > > On Thu, Jan 12, 2017 at 11:39 AM, Dongjin Lee <dong...@apache.org>
> > wrote:
> > >
> > > > Okay, I will have a try.
> > > > Thanks Ewen for the guidance!!
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Thu, Jan 12, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > >
> > > >> That's a good point Ewen. Dongjin, you could use the branch that
> Ewen
> > > >> linked for the performance testing. It would also help validate the
> > PR.
> > > >>
> > > >> Ismael
> > > >>
> > > >> On Wed, Jan 11, 2017 at 9:38 PM, Ewen Cheslack-Postava <
> > > e...@confluent.io
> > > >> >
> > > >> wrote:
> > > >>
> > > >> > FYI, there's an outstanding patch for getting some JMH
> benchmarking
> > > >> setup:
> > > >> > https://github.com/apache/kafka/pull/1712 I haven't found time to
> > > >> review
> > > >> > it
> > > >> > (and don't really know JMH well anyway) but it might be worth
> > getting
> > > >> that
> > > >> > landed so we can use it for this as well.
> > > >> >
> > > >> > -Ewen
> > > >> >
> > > >> > On Wed, Jan 11, 2017 at 6:35 AM, Dongjin Lee <dong...@apache.org>
> > > >> wrote:
> > > >> >
> > > >> > > Hi Ismael,
> > > >> > >
> > > >> > > 1. In the case of compression output, yes, lz4 is producing the
> > > >> smaller
> > > >> > > output than gzip. In fact, my benchmark was inspired
> > > >> > > by MessageCompressionTest#testCompressSize unit test and the
> > result
> > > >> is
> > > >> > > same - 396 bytes for gzip and 387 bytes for lz4.
> > > >> > > 2. I agree that my (former) approach can result in unreliable
> > > output.
> > > >> > > However, I am experiencing difficulties on how to acquire the
> > > >> benchmark
> > > >> > > metrics from Kafka. For you recommended JMH, I just started to
> > > google
> > > >> for
> > > >> > > it. If possible, could you give any example on how to use JMH
> > > against
> > > >> > > Kafka? If it is the case, it will be a great help.
> > > >> > > Regards,Dongjin
> > > >> > >
> > > >> > >                 _____________________________
> > > >> > > From: Ismael Juma <ism...@juma.me.uk>
> > > >> > > Sent: Wednesday, January 11, 2017 7:33 PM
> > > >> > > Subject: Re: [DISCUSS] KIP-110: Add Codec for ZStandard
> > Compression
> > > >> > > To:  <dev@kafka.apache.org>
> > > >> > >
> > > >> > >
> > > >> > > Thanks Dongjin. I highly recommend using JMH for the benchmark,
> > the
> > > >> > > existing one has a few problems that could result in unreliable
> > > >> results.
> > > >> > > Also, it's a bit surprising that LZ4 is producing smaller output
> > > than
> > > >> > gzip.
> > > >> > > Is that right?
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Wed, Jan 11, 2017 at 10:20 AM, Dongjin Lee <
> dong...@apache.org
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Ismael,
> > > >> > > >
> > > >> > > > I pushed the benchmark code I used, with some updates
> > (iteration:
> > > >> 20 ->
> > > >> > > > 1000). I also updated the KIP page with the updated benchmark
> > > >> results.
> > > >> > > > Please take a review when you are free. The attached
> screenshot
> > > >> shows
> > > >> > how
> > > >> > > > to run the benchmarker.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dongjin
> > > >> > > >
> > > >> > > > On Tue, Jan 10, 2017 at 8:03 PM, Dongjin Lee <
> > dong...@apache.org>
> > > >> > wrote:
> > > >> > > >
> > > >> > > >> Ismael,
> > > >> > > >>
> > > >> > > >> I see. Then, I will share the benchmark code I used by
> > tomorrow.
> > > >> > Thanks
> > > >> > > >> for your guidance.
> > > >> > > >>
> > > >> > > >> Best,
> > > >> > > >> Dongjin
> > > >> > > >>
> > > >> > > >> -----
> > > >> > > >>
> > > >> > > >> Dongjin Lee
> > > >> > > >>
> > > >> > > >> Software developer in Line+.
> > > >> > > >> So interested in massive-scale machine learning.
> > > >> > > >>
> > > >> > > >> facebook: www.facebook.com/dongjin.lee.kr
> > > >> > > >> linkedin: kr.linkedin.com/in/dongjinleekr
> > > >> > > >> github: github.com/dongjinleekr
> > > >> > > >> twitter: www.twitter.com/dongjinleekr
> > > >> > > >>
> > > >> > > >>
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> On Tue, Jan 10, 2017 at 7:24 PM +0900, "Ismael Juma" <
> > > >> > ism...@juma.me.uk
> > > >> > > >
> > > >> > > >> wrote:
> > > >> > > >>
> > > >> > > >> Dongjin,
> > > >> > > >>>
> > > >> > > >>> The KIP states:
> > > >> > > >>>
> > > >> > > >>> "I compared the compressed size and compression time of 3
> > > >> 1kb-sized
> > > >> > > >>> messages (3102 bytes in total), with the
> Draft-implementation
> > of
> > > >> > > ZStandard
> > > >> > > >>> Compression Codec and all currently available
> > CompressionCodecs.
> > > >> All
> > > >> > > >>> elapsed times are the average of 20 trials."
> > > >> > > >>>
> > > >> > > >>> But doesn't give any details of how this was implemented. Is
> > the
> > > >> > source
> > > >> > > >>> code available somewhere? Micro-benchmarking in the JVM is
> > > pretty
> > > >> > > tricky so
> > > >> > > >>> it needs verification before numbers can be trusted. A
> > > performance
> > > >> > test
> > > >> > > >>> with kafka-producer-perf-test.sh would be nice to have as
> > well,
> > > if
> > > >> > > possible.
> > > >> > > >>>
> > > >> > > >>> Thanks,
> > > >> > > >>> Ismael
> > > >> > > >>>
> > > >> > > >>> On Tue, Jan 10, 2017 at 7:44 AM, Dongjin Lee  wrote:
> > > >> > > >>>
> > > >> > > >>> > Ismael,
> > > >> > > >>> >
> > > >> > > >>> > 1. Is the benchmark in the KIP page not enough? You mean
> we
> > > >> need a
> > > >> > > whole
> > > >> > > >>> > performance test using kafka-producer-perf-test.sh?
> > > >> > > >>> >
> > > >> > > >>> > 2. It seems like no major project is relying on it
> > currently.
> > > >> > > However,
> > > >> > > >>> > after reviewing the code, I concluded that at least this
> > > project
> > > >> > has
> > > >> > > a good
> > > >> > > >>> > test coverage. And for the problem of upstream tracking -
> > > >> although
> > > >> > > there is
> > > >> > > >>> > no significant update on ZStandard to judge this problem,
> it
> > > >> seems
> > > >> > > not bad.
> > > >> > > >>> > If required, I can take responsibility of the tracking for
> > > this
> > > >> > > library.
> > > >> > > >>> >
> > > >> > > >>> > Thanks,
> > > >> > > >>> > Dongjin
> > > >> > > >>> >
> > > >> > > >>> > On Tue, Jan 10, 2017 at 7:09 AM, Ismael Juma  wrote:
> > > >> > > >>> >
> > > >> > > >>> > > Thanks for posting the KIP, ZStandard looks like a nice
> > > >> > > improvement over
> > > >> > > >>> > > the existing compression algorithms. A couple of
> > questions:
> > > >> > > >>> > >
> > > >> > > >>> > > 1. Can you please elaborate on the details of the
> > benchmark?
> > > >> > > >>> > > 2. About https://github.com/luben/zstd-jni, can we rely
> > on
> > > >> it? A
> > > >> > > few
> > > >> > > >>> > > things
> > > >> > > >>> > > to consider: are there other projects using it, does it
> > have
> > > >> good
> > > >> > > test
> > > >> > > >>> > > coverage, are there performance tests, does it track
> > > upstream
> > > >> > > closely?
> > > >> > > >>> > >
> > > >> > > >>> > > Thanks,
> > > >> > > >>> > > Ismael
> > > >> > > >>> > >
> > > >> > > >>> > > On Fri, Jan 6, 2017 at 2:40 AM, Dongjin Lee  wrote:
> > > >> > > >>> > >
> > > >> > > >>> > > > Hi all,
> > > >> > > >>> > > >
> > > >> > > >>> > > > I've just posted a new KIP "KIP-110: Add Codec for
> > > ZStandard
> > > >> > > >>> > Compression"
> > > >> > > >>> > > > for
> > > >> > > >>> > > > discussion:
> > > >> > > >>> > > >
> > > >> > > >>> > > > https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-
> > > >> > > >>> > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > >> > > >>> > > >
> > > >> > > >>> > > > Please have a look when you are free.
> > > >> > > >>> > > >
> > > >> > > >>> > > > Best,
> > > >> > > >>> > > > Dongjin
> > > >> > > >>> > > >
> > > >> > > >>> > > > --
> > > >> > > >>> > > > *Dongjin Lee*
> > > >> > > >>> > > >
> > > >> > > >>> > > >
> > > >> > > >>> > > > *Software developer in Line+.So interested in
> > > massive-scale
> > > >> > > machine
> > > >> > > >>> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > >> > > >>> > > > linkedin:
> > > >> > > >>> > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > >>> > > > github:
> > > >> > > >>> > > > github.com/dongjinleekr
> > > >> > > >>> > > > twitter: www.twitter.com/dongjinleekr
> > > >> > > >>> > > > *
> > > >> > > >>> > > >
> > > >> > > >>> > >
> > > >> > > >>> >
> > > >> > > >>> >
> > > >> > > >>> >
> > > >> > > >>> > --
> > > >> > > >>> > *Dongjin Lee*
> > > >> > > >>> >
> > > >> > > >>> >
> > > >> > > >>> > *Software developer in Line+.So interested in
> massive-scale
> > > >> machine
> > > >> > > >>> > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > >> > > >>> > linkedin:
> > > >> > > >>> > kr.linkedin.com/in/dongjinleekr
> > > >> > > >>> > github:
> > > >> > > >>> > github.com/dongjinleekr
> > > >> > > >>> > twitter: www.twitter.com/dongjinleekr
> > > >> > > >>> > *
> > > >> > > >>> >
> > > >> > > >>>
> > > >> > > >>>
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > *Dongjin Lee*
> > > >> > > >
> > > >> > > >
> > > >> > > > *Software developer in Line+.So interested in massive-scale
> > > machine
> > > >> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > >> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> > > >> kr.linkedin.com/in/
> > > >> > > dongjinleekr
> > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>github:
> > > >> > > > <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > <http://github.com/dongjinleekr>twitter:
> > > >> www.twitter.com/dongjinleekr
> > > >> > > > <http://www.twitter.com/dongjinleekr>*
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > >
> > > > *Software developer in Line+.So interested in massive-scale machine
> > > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>github:
> > > > <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>twitter:
> www.twitter.com/dongjinleekr
> > > > <http://www.twitter.com/dongjinleekr>*
> > > >
> > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > >
> > > *Software developer in Line+.So interested in massive-scale machine
> > > learning.facebook: www.facebook.com/dongjin.lee.kr
> > > <http://www.facebook.com/dongjin.lee.kr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>github:
> > > <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
> > > <http://www.twitter.com/dongjinleekr>*
> > >
> >
>



-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*

Reply via email to