Hi Dongjin,

Hi Kafka devs,

Thanks a lot for opening this KIP — and hats off for the amount of benchmarking 
and investigation you’ve done! It’s great to see a follow-up to KIP-390 that 
digs deeper into these compression-level options with solid data to back it.

One thing I wanted to clarify: what specific compression options are we 
targeting here? From what I saw in the related PR [1], it seems we’re mostly 
exposing block and window sizes. But many codecs expose more than that:

GZIP has options like strategy, window size, and buffer size.

LZ4 supports block size (64KB–4MB), block mode (independent vs linked), 
checksums, and dictionaries.

Snappy, as far as I know, doesn’t expose much for tuning.

ZSTD has a huge set: threading, window size, block size, dictionaries, 
long-distance matching, checksums, etc. It’s a beast in terms of 
configurability. 

So I’m curious — is the intent of this KIP to eventually support a broader set 
of codec-specific settings, or are we intentionally scoping it down to just 
block/window size for now?

Also, just to check — are you still interested in implementing this KIP (i.e., 
KIP-780)? If not, would you be open to me taking it over or helping move it 
forward? Of course, only if that works for you — I’d be happy to coordinate if 
there’s still interest in pursuing this.

Looking forward to your thoughts!

Best,

Maros Orsak

[1] - https://github.com/apache/kafka/pull/11388/files


On 2021/10/18 07:35:12 Dongjin Lee wrote:
> Hi Ismael, do you have any opinion on this approach and benchmark results?
> 
> Thanks,
> Dongjin
> 
> On Wed, Oct 13, 2021 at 8:09 PM Dongjin Lee <do...@apache.org> wrote:
> 
> > Hi Chen,
> >
> > > It said, available value is [10, 22], but default is a value out of that
> > range, which should be wrong.
> >
> > Oh yes, it was a mistake! Thank you for reading the proposal so carefully.
> > '0 or [10, 22]' is right. (I just fixed it.)
> >
> > Best,
> > Dongjin
> >
> > On Wed, Oct 13, 2021 at 6:17 PM Luke Chen <sh...@gmail.com> wrote:
> >
> >> Hi Dongjin,
> >> Thanks for the KIP, and the benchmark results. It makes sense to me.
> >>
> >> Just one question:
> >> > compression.zstd.window: enables long mode; the log of the window size
> >> that zstd uses to memorize the compressing data. (available: [10, 22],
> >> default: 0 (disables long mode.))
> >>
> >> It said, available value is [10, 22], but default is a value out of that
> >> range, which should be wrong.
> >>
> >> Thank you.
> >> Luke
> >>
> >> On Sun, Oct 10, 2021 at 9:50 PM Dongjin Lee <do...@apache.org> wrote:
> >>
> >> > Hi Kafka dev,
> >> >
> >> > I would like to start the discussion of KIP-780: Support fine-grained
> >> > compression options.
> >> >
> >> >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options
> >> >
> >> > Here is some context or history on this feature; initially, this feature
> >> > was intended to be a part of KIP-390: Support Compression Level, but
> >> when I
> >> > was working on it, I could not find the evidence that these options can
> >> > improve the performance, so it was excluded from the final proposal.
> >> Since
> >> > this (tentative) conclusion was somewhat strange, KIP-390 was passed
> >> under
> >> > the condition that a following work should be done for the
> >> > buffer/block/window-related configuration options.
> >> >
> >> > And after some repetitive prototypes and benchmarks, it seems like I
> >> > finally found the evidence. It is why I am submitting it as a separate
> >> > proposal now. The document also includes what I found during the tests
> >> in
> >> > the Benchmark section.
> >> >
> >> > All kinds of feedbacks are greatly appreciated!
> >> >
> >> > Best,
> >> > Dongjin
> >> >
> >> > --
> >> > *Dongjin Lee*
> >> >
> >> > *A hitchhiker in the mathematical world.*
> >> >
> >> >
> >> >
> >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > <https://github.com/dongjinleekr>keybase:
> >> https://keybase.io/dongjinleekr
> >> > <https://keybase.io/dongjinleekr>linkedin:
> >> kr.linkedin.com/in/dongjinleekr
> >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > speakerdeck.com/dongjin
> >> > <https://speakerdeck.com/dongjin>*
> >> >
> >>
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> >
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr
> > <https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: 
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
> 
> 
> -- 
> *Dongjin Lee*
> 
> *A hitchhiker in the mathematical world.*
> 
> 
> 
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr
> <https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
> 

Reply via email to