> I am OK with doing compression level first, but I don't want to rule out the buffer size change without understanding better.
I see. I am now retrying buffer size configuration & benchmark. As soon as I get a promising result, I will update the KIP. Thanks, Dongjin On Wed, Jun 9, 2021 at 12:36 AM Ismael Juma <ism...@juma.me.uk> wrote: > Btw, I am OK with doing compression level first, but I don't want to rule > out the buffer size change without understanding better. > > Ismael > > On Tue, Jun 8, 2021 at 8:33 AM Ismael Juma <ism...@juma.me.uk> wrote: > > > Hi Dongjin, > > > > I was thinking of a simple test: Snappy with 1 KB block size vs 32 KB > > block size. If the compression rate is similar for both, then it seems > very > > wasteful to use 32 KB. I suspect you will see a significant difference > > though. > > > > Ismael > > > > On Tue, Jun 8, 2021 at 8:27 AM Dongjin Lee <dong...@apache.org> wrote: > > > >> Hi Ismael, > >> > >> I added the linear write benchmark result to the proposal. Like the > >> producer benchmark, the least compression level showed the best MB/sec > for > >> any case. I tested several configurations, but the result was almost the > >> same. > >> > >> If you have any proposals for the benchmark, don't hesitate to give me a > >> suggestion. I am a newbie to run the linear write benchmark. > >> > >> Best, > >> Dongjin > >> > >> On Sun, Jun 6, 2021 at 8:20 AM Dongjin Lee <dong...@apache.org> wrote: > >> > >> > Hi Ismael, > >> > > >> > Thanks for the reply. > >> > > >> > > So you're saying that reducing the buffer size didn't reduce the > >> > compression rate for codecs like lz4? > >> > > >> > Of course, there were some improvements in compressed size when I > tried > >> > the 'buffer.size' option, but the gain was not significant. I tried > >> several > >> > datasets, but the result was the same. It made me so skeptical about > >> adding > >> > this option, which seemed to make the configuration option complex > only. > >> > > >> > In contrast, 'compression.level' showed its effectiveness immediately. > >> It > >> > is why I decided to focus on the 'compression.level' in this rework. > >> > > >> > As you can see in the update KIP with the benchmark, IMHO, the true > >> value > >> > of supporting the compression option may not be the compressed size or > >> > rate, but speed. By tweaking the compression level slightly, it showed > >> > great produce performance gain. > >> > > >> > Thanks, > >> > Dongjin > >> > > >> > > >> > On Sun, Jun 6, 2021 at 6:48 AM Ismael Juma <ism...@juma.me.uk> wrote: > >> > > >> >> Thanks Dongjin. So you're saying that reducing the buffer size didn't > >> >> reduce the compression rate for codecs like lz4? If so, that would > >> suggest > >> >> reducing the default value, but that seems odd. > >> >> > >> >> Ismael > >> >> > >> >> On Sat, Jun 5, 2021, 9:25 AM Dongjin Lee <dong...@apache.org> wrote: > >> >> > >> >> > Hello Kafka dev, > >> >> > > >> >> > I hope to reboot the discussion of KIP-390: Support Compression > Level > >> >> > < > >> >> > > >> >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+Compression+Level > >> >> > >. > >> >> > It proposes to add a new option, 'compression.level', that controls > >> the > >> >> > compression level. > >> >> > > >> >> > This KIP has been submitted more than one year ago, but had been > >> >> neglected > >> >> > for a long time. Recently I reworked it from scratch with the > >> following > >> >> > differences: > >> >> > > >> >> > 1. Tested how it works with a real-world dataset. As you can see in > >> the > >> >> > updated KIP, *this feature can improve the producer's > message/second > >> >> rate > >> >> > by more than 50%*, such a significant enhancement. > >> >> > 2. Dropped 'compression.buffer.size' option that was in the initial > >> >> work. > >> >> > With the repeated benchmarks, I could not find any evidence this > >> option > >> >> > results in meaningful differences. So I removed it. > >> >> > > >> >> > All feedback will be highly appreciated. > >> >> > > >> >> > Best, > >> >> > Dongjin > >> >> > > >> >> > > >> >> > -- > >> >> > *Dongjin Lee* > >> >> > > >> >> > *A hitchhiker in the mathematical world.* > >> >> > > >> >> > > >> >> > > >> >> > *github: <http://goog_969573159/>github.com/dongjinleekr > >> >> > <https://github.com/dongjinleekr>keybase: > >> >> https://keybase.io/dongjinleekr > >> >> > <https://keybase.io/dongjinleekr>linkedin: > >> >> kr.linkedin.com/in/dongjinleekr > >> >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > >> >> > speakerdeck.com/dongjin > >> >> > <https://speakerdeck.com/dongjin>* > >> >> > > >> >> > >> > > >> > > >> > -- > >> > *Dongjin Lee* > >> > > >> > *A hitchhiker in the mathematical world.* > >> > > >> > > >> > > >> > *github: <http://goog_969573159/>github.com/dongjinleekr > >> > <https://github.com/dongjinleekr>keybase: > >> https://keybase.io/dongjinleekr > >> > <https://keybase.io/dongjinleekr>linkedin: > >> kr.linkedin.com/in/dongjinleekr > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > >> speakerdeck.com/dongjin > >> > <https://speakerdeck.com/dongjin>* > >> > > >> > >> > >> -- > >> *Dongjin Lee* > >> > >> *A hitchhiker in the mathematical world.* > >> > >> > >> > >> *github: <http://goog_969573159/>github.com/dongjinleekr > >> <https://github.com/dongjinleekr>keybase: > https://keybase.io/dongjinleekr > >> <https://keybase.io/dongjinleekr>linkedin: > >> kr.linkedin.com/in/dongjinleekr > >> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: > >> speakerdeck.com/dongjin > >> <https://speakerdeck.com/dongjin>* > >> > > > -- *Dongjin Lee* *A hitchhiker in the mathematical world.* *github: <http://goog_969573159/>github.com/dongjinleekr <https://github.com/dongjinleekr>keybase: https://keybase.io/dongjinleekr <https://keybase.io/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin <https://speakerdeck.com/dongjin>*