Hi Dongjin, Hi Kafka devs,
Thanks a lot for opening this KIP — and hats off for the amount of benchmarking and investigation you’ve done! It’s great to see a follow-up to KIP-390 that digs deeper into these compression-level options with solid data to back it. One thing I wanted to clarify: what specific compression options are we targeting here? From what I saw in the related PR [1], it seems we’re mostly exposing block and window sizes. But many codecs expose more than that: GZIP has options like strategy, window size, and buffer size. LZ4 supports block size (64KB–4MB), block mode (independent vs linked), checksums, and dictionaries. Snappy, as far as I know, doesn’t expose much for tuning. ZSTD has a huge set: threading, window size, block size, dictionaries, long-distance matching, checksums, etc. It’s a beast in terms of configurability. So I’m curious — is the intent of this KIP to eventually support a broader set of codec-specific settings, or are we intentionally scoping it down to just block/window size for now? Also, just to check — are you still interested in implementing this KIP (i.e., KIP-780)? If not, would you be open to me taking it over or helping move it forward? Of course, only if that works for you — I’d be happy to coordinate if there’s still interest in pursuing this. Looking forward to your thoughts! Best, Maros Orsak [1] - https://github.com/apache/kafka/pull/11388/files On 2021/10/18 07:35:12 Dongjin Lee wrote: > Hi Ismael, do you have any opinion on this approach and benchmark results? > > Thanks, > Dongjin > > On Wed, Oct 13, 2021 at 8:09 PM Dongjin Lee <do...@apache.org> wrote: > > > Hi Chen, > > > > > It said, available value is [10, 22], but default is a value out of that > > range, which should be wrong. > > > > Oh yes, it was a mistake! Thank you for reading the proposal so carefully. > > '0 or [10, 22]' is right. (I just fixed it.) > > > > Best, > > Dongjin > > > > On Wed, Oct 13, 2021 at 6:17 PM Luke Chen <sh...@gmail.com> wrote: > > > >> Hi Dongjin, > >> Thanks for the KIP, and the benchmark results. It makes sense to me. > >> > >> Just one question: > >> > compression.zstd.window: enables long mode; the log of the window size > >> that zstd uses to memorize the compressing data. (available: [10, 22], > >> default: 0 (disables long mode.)) > >> > >> It said, available value is [10, 22], but default is a value out of that > >> range, which should be wrong. > >> > >> Thank you. > >> Luke > >> > >> On Sun, Oct 10, 2021 at 9:50 PM Dongjin Lee <do...@apache.org> wrote: > >> > >> > Hi Kafka dev, > >> > > >> > I would like to start the discussion of KIP-780: Support fine-grained > >> > compression options. > >> > > >> > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-780%3A+Support+fine-grained+compression+options > >> > > >> > Here is some context or history on this feature; initially, this feature > >> > was intended to be a part of KIP-390: Support Compression Level, but > >> when I > >> > was working on it, I could not find the evidence that these options can > >> > improve the performance, so it was excluded from the final proposal. > >> Since > >> > this (tentative) conclusion was somewhat strange, KIP-390 was passed > >> under > >> > the condition that a following work should be done for the > >> > buffer/block/window-related configuration options. > >> > > >> > And after some repetitive prototypes and benchmarks, it seems like I > >> > finally found the evidence. It is why I am submitting it as a separate > >> > proposal now. The document also includes what I found during the tests > >> in > >> > the Benchmark section. > >> > > >> > All kinds of feedbacks are greatly appreciated! > >> > > >> > Best, > >> > Dongjin > >> > > >> > -- > >> > *Dongjin Lee* > >> > > >> > *A hitchhiker in the mathematical world.* > >> > > >> > > >> > > >> > *github: <http://goog_969573159/>github.com/dongjinleekr > >> > <https://github.com/dongjinleekr>keybase: > >> https://keybase.io/dongjinleekr > >> > <https://keybase.io/dongjinleekr>linkedin: > >> kr.linkedin.com/in/dongjinleekr > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > >> > speakerdeck.com/dongjin > >> > <https://speakerdeck.com/dongjin>* > >> > > >> > > > > > > -- > > *Dongjin Lee* > > > > *A hitchhiker in the mathematical world.* > > > > > > > > *github: <http://goog_969573159/>github.com/dongjinleekr > > <https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr > > <https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > > speakerdeck.com/dongjin > > <https://speakerdeck.com/dongjin>* > > > > > -- > *Dongjin Lee* > > *A hitchhiker in the mathematical world.* > > > > *github: <http://goog_969573159/>github.com/dongjinleekr > <https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr > <https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin > <https://speakerdeck.com/dongjin>* >