I just updated the draft PR. Please have a look when you are free. Following the update, I have a minor update to the KIP document - 'compression.snappy.level' and 'compression.zstd.buffer.size' were removed since these options are meaningless. (Snappy doesn't support compression level, and zstd doesn't support buffer size.)
If there are no other feedbacks, I hope to put this proposal to the vote by next Monday, Jan 21. Thanks, Dongjin On Tue, Jan 15, 2019 at 3:17 AM Dongjin Lee <dong...@apache.org> wrote: > I just realized that there was a missing hole in the KIP, so I fixed it. > The draft implementation will be updated soon. > > In short, the proposed change did not regard the case of the topic or > broker's 'compression.type' is 'producer'; in this case, the broker has to > handle all kinds of the supported codec. So I added additional options > (compression.[gzip,snappy,lz4, zstd].level, compression.[gzip,snappy,lz4, > zstd].buffer.size) with handling routines. > > Please have a look when you are free. > > Thanks, > Dongjin > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <dong...@apache.org> wrote: > >> Thanks for pointing out Ismael. It's now updated. >> >> Best, >> Dongjin >> >> On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <isma...@gmail.com> wrote: >> >>> Thanks Dongjin. One minor suggestion: we should mention that the broker >>> side configs are also topic configs (i.e. can be set for a given topic). >>> >>> Ismael >>> >>> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dong...@apache.org wrote: >>> >>> > Happy new year. >>> > >>> > I just updated the title and contents of KIP and Jira issue, with >>> updated >>> > draft implementation. Now both of compression level and buffer size >>> options >>> > are available to producer and broker configuration. You can check the >>> > updated KIP from modified URL: >>> > >>> > >>> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression >>> > >>> > Please have a look when you are free. >>> > >>> > Thanks, >>> > Dongjin >>> > >>> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <isma...@gmail.com> wrote: >>> > >>> > > The updated title sounds fine to me. >>> > > >>> > > Ismael >>> > > >>> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dong...@apache.org wrote: >>> > > >>> > > > Hi Ismael, >>> > > > >>> > > > Got it. Your direction is perfectly reasonable. I am now updating >>> the >>> > KIP >>> > > > document and the implementation. >>> > > > >>> > > > By allowing the buffer/block size to be configurable, it would be >>> > better >>> > > to >>> > > > update the title of the KIP like 'Allow fine-grained configuration >>> for >>> > > > compression'. Is that right? >>> > > > >>> > > > @Other committers: >>> > > > >>> > > > Is there any other opinion on allowing the buffer/block size to be >>> > > > configurable? >>> > > > >>> > > > Thanks, >>> > > > Dongjin >>> > > > >>> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <ism...@juma.me.uk> >>> wrote: >>> > > > >>> > > > > Hi Dongjin, >>> > > > > >>> > > > > To clarify, I mean a broker topic config with regards to point >>> 1. As >>> > > you >>> > > > > know, compression can be done by the producer and/or by the >>> broker. >>> > The >>> > > > > default is for the broker to just use whatever compression was >>> used >>> > by >>> > > > the >>> > > > > producer, but this can be changed by the user on a per topic >>> basis. >>> > It >>> > > > > seems like it would make sense for the configs to be . consistent >>> > > between >>> > > > > producer and broker. >>> > > > > >>> > > > > For point 2, I haven't looked at the implementation, but we >>> could do >>> > it >>> > > > in >>> > > > > the `CompressionType` enum by invoking the right constructor or >>> > > > retrieving >>> > > > > the default value via a constant (if defined). That's an >>> > implementation >>> > > > > detail and can be discussed in the PR. The more general point is >>> to >>> > > rely >>> > > > on >>> > > > > the library defaults instead of choosing one ourselves. >>> > > > > >>> > > > > For point 3, I'm in favour of doing that in this KIP. >>> > > > > >>> > > > > Ismael >>> > > > > >>> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <dong...@apache.org> >>> > > wrote: >>> > > > > >>> > > > > > Thank you Ismael, here are the answers: >>> > > > > > >>> > > > > > *1. About topic config* >>> > > > > > >>> > > > > > After some consideration, I concluded that topic config doesn't >>> > need >>> > > to >>> > > > > > support compression.level. Here is why: since the compression >>> is >>> > > > > conducted >>> > > > > > by the client, the one who can select the best compression >>> level is >>> > > the >>> > > > > > client itself. Let us assume that the compression level is set >>> at >>> > the >>> > > > > topic >>> > > > > > config level. In that case, there is a possibility that the >>> > > compression >>> > > > > > level is not optimal for some producers. Actually, Kafka's go >>> > client >>> > > > also >>> > > > > > supports compression level functionality for the producer >>> config >>> > > only. >>> > > > > > <https://github.com/Shopify/sarama/blob/master/config.go> >>> (wait, >>> > do >>> > > we >>> > > > > > need >>> > > > > > to add this reasoning in the KIP, rejected alternatives >>> section?) >>> > > > > > >>> > > > > > *2. About default level* >>> > > > > > >>> > > > > > As of current draft implementation, the default compression is >>> set >>> > on >>> > > > the >>> > > > > > CompressionType enum. Of course, changing this strategy into >>> > relying >>> > > > on a >>> > > > > > method from the library to pick the default compression level >>> seems >>> > > > > > possible, like `GZIPBlockOutputStream` does. In this case, we >>> need >>> > to >>> > > > add >>> > > > > > similar wrapper class for zstd and modify lz4 the wrapper >>> also. Add >>> > > to >>> > > > > > this, it seems like we need to explicitly state that we follow >>> the >>> > > > > default >>> > > > > > compression level of the codec in the documentation. Is this >>> what >>> > you >>> > > > > > intended? >>> > > > > > >>> > > > > > *3. Whether to allow the buffer/block size to be configurable* >>> > > > > > >>> > > > > > Well, As of current draft implementation, the lz4 level is >>> > > implemented >>> > > > as >>> > > > > > block size; this is caused by my misunderstanding on lz4. After >>> > > > reviewing >>> > > > > > lz4 today, I found that it also supports compression level of >>> 1~16 >>> > > > > > (default: 1), not block size. I will fix it in this weekend by >>> > > updating >>> > > > > the >>> > > > > > wrapper class. >>> > > > > > >>> > > > > > For the problem of the buffer/block size, I have no strong >>> opinion. >>> > > If >>> > > > > the >>> > > > > > community needs it, I will do it all together. How do you >>> think? >>> > > > > > >>> > > > > > In short, it seems like I need to update the KIP document for >>> issue >>> > > #1 >>> > > > > and >>> > > > > > update the compression wrapper for issue #2, #3. Is this okay? >>> > > > > > >>> > > > > > Thanks, >>> > > > > > Dongjin >>> > > > > > >>> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma < >>> isma...@gmail.com> >>> > > > wrote: >>> > > > > > >>> > > > > > > Thanks for the KIP, this is helpful. A few questions: >>> > > > > > > >>> > > > > > > 1. Have we considered whether we want to allow a similar >>> topic >>> > > > config? >>> > > > > > > 2. Can we rely on a method from the library to pick the >>> default >>> > > > > > compression >>> > > > > > > level if compression.level is not set? We do it for gzip and >>> it >>> > > would >>> > > > > > seem >>> > > > > > > reasonable to do something similar for the other compression >>> > > > libraries. >>> > > > > > > 3. Do we want to allow the buffer/block size to be >>> configurable? >>> > > This >>> > > > > has >>> > > > > > > an impact on memory usage and people may want to trade >>> > compression >>> > > > for >>> > > > > > > less/more memory in some cases. For example, the default for >>> LZ4 >>> > is >>> > > > > 64KB >>> > > > > > > which is a bit high. >>> > > > > > > >>> > > > > > > Ismael >>> > > > > > > >>> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee < >>> dong...@apache.org >>> > > wrote: >>> > > > > > > >>> > > > > > > > Hello dev, >>> > > > > > > > >>> > > > > > > > I hope to initiate the discussion of KIP-390: Add producer >>> > option >>> > > > to >>> > > > > > > adjust >>> > > > > > > > compression level >>> > > > > > > > < >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level >>> > > > > > > > >. >>> > > > > > > > All feedbacks will be highly appreciated. >>> > > > > > > > >>> > > > > > > > Best, >>> > > > > > > > Dongjin >>> > > > > > > > >>> > > > > > > > -- >>> > > > > > > > *Dongjin Lee* >>> > > > > > > > >>> > > > > > > > *A hitchhiker in the mathematical world.* >>> > > > > > > > >>> > > > > > > > *github: <http://goog_969573159/>github.com/dongjinleekr >>> > > > > > > > <http://github.com/dongjinleekr>linkedin: >>> > > > > > > kr.linkedin.com/in/dongjinleekr >>> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare: >>> > > > > > > > www.slideshare.net/dongjinleekr >>> > > > > > > > <http://www.slideshare.net/dongjinleekr>* >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > > -- >>> > > > > > *Dongjin Lee* >>> > > > > > >>> > > > > > *A hitchhiker in the mathematical world.* >>> > > > > > *github: <http://goog_969573159/>github.com/dongjinleekr >>> > > > > > <https://github.com/dongjinleekr>linkedin: >>> > > > > kr.linkedin.com/in/dongjinleekr >>> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: >>> > > > > > speakerdeck.com/dongjin >>> > > > > > <https://speakerdeck.com/dongjin>* >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > > -- >>> > > > *Dongjin Lee* >>> > > > >>> > > > *A hitchhiker in the mathematical world.* >>> > > > *github: <http://goog_969573159/>github.com/dongjinleekr >>> > > > <https://github.com/dongjinleekr>linkedin: >>> > > kr.linkedin.com/in/dongjinleekr >>> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: >>> > > > speakerdeck.com/dongjin >>> > > > <https://speakerdeck.com/dongjin>* >>> > > > >>> > > >>> > >>> > >>> > -- >>> > *Dongjin Lee* >>> > >>> > *A hitchhiker in the mathematical world.* >>> > *github: <http://goog_969573159/>github.com/dongjinleekr >>> > <https://github.com/dongjinleekr>linkedin: >>> kr.linkedin.com/in/dongjinleekr >>> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: >>> > speakerdeck.com/dongjin >>> > <https://speakerdeck.com/dongjin>* >>> > >>> >> >> >> -- >> *Dongjin Lee* >> >> *A hitchhiker in the mathematical world.* >> *github: <http://goog_969573159/>github.com/dongjinleekr >> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr >> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin >> <https://speakerdeck.com/dongjin>* >> > > > -- > *Dongjin Lee* > > *A hitchhiker in the mathematical world.* > *github: <http://goog_969573159/>github.com/dongjinleekr > <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin > <https://speakerdeck.com/dongjin>* > -- *Dongjin Lee* *A hitchhiker in the mathematical world.* *github: <http://goog_969573159/>github.com/dongjinleekr <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin <https://speakerdeck.com/dongjin>*